You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/04 22:19:49 UTC

[GitHub] [daffodil] tuxji commented on a change in pull request #494: Support booleans, arrays of complex types, fixed values, fill bytes

tuxji commented on a change in pull request #494:
URL: https://github.com/apache/daffodil/pull/494#discussion_r586712336



##########
File path: daffodil-runtime2/src/main/resources/c/libcli/xml_reader.c
##########
@@ -123,6 +151,10 @@ strtofnum(const char *numptr, const char **errstrp)
     {
         *errstrp = "Found non-number characters in XML data";
     }
+    else if (value < minval || value > maxval || maxval < minval)

Review comment:
       Yes, the standard library of the C programming language has an [assert.h](https://en.wikipedia.org/wiki/Assert.h) header file that defines the C preprocessor macro assert().  I agree, let's start using the assert macro to check pre-conditions and invariants like `minval < maxval` since assertion failures will halt program execution immediately with a clear error message.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2CodeGenerator.scala
##########
@@ -58,12 +61,13 @@ object Runtime2CodeGenerator
       case g: ElementCombinator => Runtime2CodeGenerator.generateCode(g.subComb, state)
       case g: SpecifiedLengthImplicit => Runtime2CodeGenerator.generateCode(g.eGram, state)
       case g: ScalarOrderedSequenceChild => Runtime2CodeGenerator.generateCode(g.term.termContentBody, state)
-      case g: SeqComp => seqCompGenerateCode(g, state)
-      case g: OrderedSequence => orderedSequenceGenerateCode(g, state)
-      case g: ElementParseAndUnspecifiedLength => elementParseAndUnspecifiedLengthGenerateCode(g, state)
-      case g: BinaryIntegerKnownLength => binaryIntegerKnownLengthGenerateCode(g, state)
-      case g: BinaryFloat => binaryFloatGenerateCode(g.e, 32, state)
+      case g: BinaryBoolean => binaryBooleanGenerateCode(g.e, state)
       case g: BinaryDouble => binaryFloatGenerateCode(g.e, 64, state)
+      case g: BinaryFloat => binaryFloatGenerateCode(g.e, 32, state)
+      case g: BinaryIntegerKnownLength => binaryIntegerKnownLengthGenerateCode(g, state)
+      case g: ElementParseAndUnspecifiedLength => elementParseAndUnspecifiedLengthGenerateCode(g, state)
+      case g: OrderedSequence => orderedSequenceGenerateCode(g, state)

Review comment:
       I don't think any of our test DFDL schemas actually have an ordered sequence in the right place to invoke orderedSequenceGenerateCode() yet.  I've never seen any code hit a breakpoint inside that function.  

##########
File path: project/Rat.scala
##########
@@ -122,6 +122,13 @@ object Rat {
     file("daffodil-tdml-lib/src/test/resources/test/tdml/test.txt"),
     file("daffodil-tdml-processor/src/test/resources/test/tdml/test.bin"),
     file("daffodil-tdml-processor/src/test/resources/test/tdml/test.txt"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/ISRM_green_to_orange_60000_parse_0.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/ISRM_green_to_orange_60000_parse_1.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/ISRM_orange_to_green_60002_parse.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/MPU_green_to_orange_60004_parse.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/MPU_orange_to_green_60006_parse_0.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/MPU_orange_to_green_60006_parse_1.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/aptina_settings_parse.dat"),

Review comment:
       No worries, I created all of these binary and XML files myself.  Sometimes I put all null bytes in the binary file, parsed it using the schema, got an XML file, edited the XML file to put some mock data into each element, and unparsed the XML file to get a better binary file.  Other times I started with another XML file, made it work, and unparsed it to get the binary file.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/parsers.c
##########
@@ -69,44 +113,72 @@
         }                                                                      \
     }
 
-// Define functions to parse binary real and integer numbers
+// Define functions to parse binary real numbers and integers
 
-define_parse_endian_real(be, double, 64)
+define_parse_endian_bool(be, 16);
+define_parse_endian_bool(be, 32);
+define_parse_endian_bool(be, 8);
 
+define_parse_endian_real(be, double, 64)
 define_parse_endian_real(be, float, 32)
 
-define_parse_endian_integer(be, uint, 64)
-
-define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, int, 16)
+define_parse_endian_integer(be, int, 32)
+define_parse_endian_integer(be, int, 64)
+define_parse_endian_integer(be, int, 8)
 
 define_parse_endian_integer(be, uint, 16)
-
+define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, uint, 64)
 define_parse_endian_integer(be, uint, 8)
 
-define_parse_endian_integer(be, int, 64)
-
-define_parse_endian_integer(be, int, 32)
-
-define_parse_endian_integer(be, int, 16)
-
-define_parse_endian_integer(be, int, 8)
+define_parse_endian_bool(le, 16);
+define_parse_endian_bool(le, 32);
+define_parse_endian_bool(le, 8);
 
 define_parse_endian_real(le, double, 64)
-
 define_parse_endian_real(le, float, 32)
 
-define_parse_endian_integer(le, uint, 64)
-
-define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, int, 16)
+define_parse_endian_integer(le, int, 32)
+define_parse_endian_integer(le, int, 64)
+define_parse_endian_integer(le, int, 8)
 
 define_parse_endian_integer(le, uint, 16)
-
+define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, uint, 64)
 define_parse_endian_integer(le, uint, 8)
 
-define_parse_endian_integer(le, int, 64)
-
-define_parse_endian_integer(le, int, 32)
-
-define_parse_endian_integer(le, int, 16)
-
-define_parse_endian_integer(le, int, 8)
+// Define function to parse fill bytes until end position is reached
+
+void
+parse_fill_bytes(size_t end_position, PState *pstate)
+{
+    while (!pstate->error_msg && pstate->position < end_position)
+    {
+        char   buffer;
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+
+        pstate->position += count;
+        if (count < sizeof(buffer))
+        {
+            pstate->error_msg = eof_or_error_msg(pstate->stream);
+        }
+    }
+}
+
+// Define function to validate number is same as fixed value after parse
+
+void
+parse_validate_fixed(bool same, const char *element, PState *pstate)
+{
+    if (!pstate->error_msg && !same)
+    {
+        // Error message would be easier to assemble and
+        // internationalize if we used an error struct with multiple
+        // fields instead of a const char string.
+        pstate->error_msg = "Parse: Value of element does not match value of "
+                            "its 'fixed' attribute";
+        (void)element; // unused because managing strings hard in embedded C

Review comment:
       Yes, `__attribute__ ((unused))` is a gcc extension which some C compilers do not support.  I've changed all `(void)(x)` to `UNUSED(x)` to make the code a little more self documenting.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -27,6 +28,38 @@
 
 // Macros to define unparse_<endian>_<type> functions
 
+#define define_unparse_endian_bool(endian, bits)                               \
+    void unparse_##endian##_bool##bits(bool number, int64_t true_rep,          \
+                                       uint32_t false_rep, UState *ustate)     \

Review comment:
       OK, that makes sense.  I've made that change as well.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/infoset.h
##########
@@ -108,6 +110,8 @@ typedef struct PState
 typedef struct UState
 {
     FILE *      stream;    // output to write data to
+    size_t      position;  // 0-based position in stream
+    char        fill_byte; // fill byte to write when needed

Review comment:
       I have removed fill_byte from struct UState and passed it directly to unparse_fill_bytes as you suggested further below.  Since we generate both the ERD initialization code and the unparse_fill_bytes calls, I'd rather pass the fill byte as a constant argument than complicate the ERD structure unnecessarily.  In Phase 1 (aka Runtime2P1), we don't support runtime DFDL expressions.  If we get to Phase 2 (aka Runtime2P2) where we add the DFDL expression language and runtime expressions, then we might need to calculate the fill byte and store its value in the ERD structure during program execution.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/parsers.c
##########
@@ -69,44 +113,72 @@
         }                                                                      \
     }
 
-// Define functions to parse binary real and integer numbers
+// Define functions to parse binary real numbers and integers
 
-define_parse_endian_real(be, double, 64)
+define_parse_endian_bool(be, 16);
+define_parse_endian_bool(be, 32);
+define_parse_endian_bool(be, 8);
 
+define_parse_endian_real(be, double, 64)
 define_parse_endian_real(be, float, 32)
 
-define_parse_endian_integer(be, uint, 64)
-
-define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, int, 16)
+define_parse_endian_integer(be, int, 32)
+define_parse_endian_integer(be, int, 64)
+define_parse_endian_integer(be, int, 8)
 
 define_parse_endian_integer(be, uint, 16)
-
+define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, uint, 64)
 define_parse_endian_integer(be, uint, 8)
 
-define_parse_endian_integer(be, int, 64)
-
-define_parse_endian_integer(be, int, 32)
-
-define_parse_endian_integer(be, int, 16)
-
-define_parse_endian_integer(be, int, 8)
+define_parse_endian_bool(le, 16);
+define_parse_endian_bool(le, 32);
+define_parse_endian_bool(le, 8);
 
 define_parse_endian_real(le, double, 64)
-
 define_parse_endian_real(le, float, 32)
 
-define_parse_endian_integer(le, uint, 64)
-
-define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, int, 16)
+define_parse_endian_integer(le, int, 32)
+define_parse_endian_integer(le, int, 64)
+define_parse_endian_integer(le, int, 8)
 
 define_parse_endian_integer(le, uint, 16)
-
+define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, uint, 64)
 define_parse_endian_integer(le, uint, 8)
 
-define_parse_endian_integer(le, int, 64)
-
-define_parse_endian_integer(le, int, 32)
-
-define_parse_endian_integer(le, int, 16)
-
-define_parse_endian_integer(le, int, 8)
+// Define function to parse fill bytes until end position is reached
+
+void
+parse_fill_bytes(size_t end_position, PState *pstate)
+{
+    while (!pstate->error_msg && pstate->position < end_position)
+    {
+        char   buffer;
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+
+        pstate->position += count;
+        if (count < sizeof(buffer))
+        {
+            pstate->error_msg = eof_or_error_msg(pstate->stream);
+        }
+    }
+}
+
+// Define function to validate number is same as fixed value after parse
+
+void
+parse_validate_fixed(bool same, const char *element, PState *pstate)
+{
+    if (!pstate->error_msg && !same)

Review comment:
       Well, callers are going to be calling runtime2 somewhat like this,
   
   ```c
               // Parse the input file into our infoset.
               PState pstate = {input, 0, NULL};
               root->erd->parseSelf(root, &pstate);
               continue_or_exit(pstate.error_msg);
   ```
   
   Inside parseSelf functions, especially for large structs or arrays, we may have hundreds of lines of code calling `parseSelf` or `parse_<endian>_<type>` one after another.  If we had to generate a second line after every call like `if (pstate->error_msg) return;`, we would double the number of lines and make it harder to see what's going on.  I think large DFDL schemas probably will generate more lines of calls in generated_code.c if we had to change the flow of control after every parser call just to avoid making another parser call if there is already an error.  Our parser functions also will be more robust and less likely to cause cascading errors if every parser function is wrapped in an `if (!pstate->error_msg)` block.
   
   In short, I think the flow of control looks simpler and cleaner and unwanted errors are less likely to happen.

##########
File path: daffodil-runtime2/src/main/resources/c/libcli/xml_reader.c
##########
@@ -292,48 +328,50 @@ xmlNumberElem(XMLReader *reader, const ERD *erd, void *numLocation)
             // Check for any errors getting the number
             const char *errstr = NULL;
 
-            // Handle varying bit lengths of both signed & unsigned integers and
-            // floating point numbers
+            // Handle varying bit lengths of both signed & unsigned numbers
             const enum TypeCode typeCode = erd->typeCode;
             switch (typeCode)
             {
-            case PRIMITIVE_UINT64:
-                *(uint64_t *)numLocation =
-                    (uint64_t)strtounum(number_from_xml, UINT64_MAX, &errstr);
+            case PRIMITIVE_BOOLEAN:
+                *(bool *)numLocation = strtobool(number_from_xml, &errstr);
                 break;
-            case PRIMITIVE_UINT32:
-                *(uint32_t *)numLocation =
-                    (uint32_t)strtounum(number_from_xml, UINT32_MAX, &errstr);
-                break;
-            case PRIMITIVE_UINT16:
-                *(uint16_t *)numLocation =
-                    (uint16_t)strtounum(number_from_xml, UINT16_MAX, &errstr);
+            case PRIMITIVE_FLOAT:
+                *(float *)numLocation = strtofnum(number_from_xml, &errstr);
                 break;
-            case PRIMITIVE_UINT8:
-                *(uint8_t *)numLocation =
-                    (uint8_t)strtounum(number_from_xml, UINT8_MAX, &errstr);
+            case PRIMITIVE_DOUBLE:
+                *(double *)numLocation = strtodnum(number_from_xml, &errstr);
                 break;
-            case PRIMITIVE_INT64:
-                *(int64_t *)numLocation = (int64_t)strtonum(
-                    number_from_xml, INT64_MIN, INT64_MAX, &errstr);
+            case PRIMITIVE_INT16:
+                *(int16_t *)numLocation = (int16_t)strtonum(
+                    number_from_xml, INT16_MIN, INT16_MAX, &errstr);
                 break;
             case PRIMITIVE_INT32:
                 *(int32_t *)numLocation = (int32_t)strtonum(
                     number_from_xml, INT32_MIN, INT32_MAX, &errstr);
                 break;
-            case PRIMITIVE_INT16:
-                *(int16_t *)numLocation = (int16_t)strtonum(
-                    number_from_xml, INT16_MIN, INT16_MAX, &errstr);
+            case PRIMITIVE_INT64:
+                *(int64_t *)numLocation = (int64_t)strtonum(
+                    number_from_xml, INT64_MIN, INT64_MAX, &errstr);
                 break;
             case PRIMITIVE_INT8:
                 *(int8_t *)numLocation = (int8_t)strtonum(
                     number_from_xml, INT8_MIN, INT8_MAX, &errstr);
                 break;
-            case PRIMITIVE_FLOAT:
-                *(float *)numLocation = strtofnum(number_from_xml, &errstr);
+            case PRIMITIVE_UINT16:
+                *(uint16_t *)numLocation =
+                    (uint16_t)strtounum(number_from_xml, UINT16_MAX, &errstr);
                 break;
-            case PRIMITIVE_DOUBLE:
-                *(double *)numLocation = strtodnum(number_from_xml, &errstr);
+            case PRIMITIVE_UINT32:
+                *(uint32_t *)numLocation =
+                    (uint32_t)strtounum(number_from_xml, UINT32_MAX, &errstr);
+                break;
+            case PRIMITIVE_UINT64:
+                *(uint64_t *)numLocation =
+                    (uint64_t)strtounum(number_from_xml, UINT64_MAX, &errstr);
+                break;
+            case PRIMITIVE_UINT8:
+                *(uint8_t *)numLocation =
+                    (uint8_t)strtounum(number_from_xml, UINT8_MAX, &errstr);

Review comment:
       Good point. I've changed "numLocation" to "number" to be consistent with parsers.[ch].

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/parsers.c
##########
@@ -69,44 +113,72 @@
         }                                                                      \
     }
 
-// Define functions to parse binary real and integer numbers
+// Define functions to parse binary real numbers and integers
 
-define_parse_endian_real(be, double, 64)
+define_parse_endian_bool(be, 16);
+define_parse_endian_bool(be, 32);
+define_parse_endian_bool(be, 8);
 
+define_parse_endian_real(be, double, 64)
 define_parse_endian_real(be, float, 32)
 
-define_parse_endian_integer(be, uint, 64)
-
-define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, int, 16)
+define_parse_endian_integer(be, int, 32)
+define_parse_endian_integer(be, int, 64)
+define_parse_endian_integer(be, int, 8)
 
 define_parse_endian_integer(be, uint, 16)
-
+define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, uint, 64)
 define_parse_endian_integer(be, uint, 8)
 
-define_parse_endian_integer(be, int, 64)
-
-define_parse_endian_integer(be, int, 32)
-
-define_parse_endian_integer(be, int, 16)
-
-define_parse_endian_integer(be, int, 8)
+define_parse_endian_bool(le, 16);
+define_parse_endian_bool(le, 32);
+define_parse_endian_bool(le, 8);
 
 define_parse_endian_real(le, double, 64)
-
 define_parse_endian_real(le, float, 32)
 
-define_parse_endian_integer(le, uint, 64)
-
-define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, int, 16)
+define_parse_endian_integer(le, int, 32)
+define_parse_endian_integer(le, int, 64)
+define_parse_endian_integer(le, int, 8)
 
 define_parse_endian_integer(le, uint, 16)
-
+define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, uint, 64)
 define_parse_endian_integer(le, uint, 8)
 
-define_parse_endian_integer(le, int, 64)
-
-define_parse_endian_integer(le, int, 32)
-
-define_parse_endian_integer(le, int, 16)
-
-define_parse_endian_integer(le, int, 8)
+// Define function to parse fill bytes until end position is reached
+
+void
+parse_fill_bytes(size_t end_position, PState *pstate)
+{
+    while (!pstate->error_msg && pstate->position < end_position)
+    {
+        char   buffer;
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+
+        pstate->position += count;
+        if (count < sizeof(buffer))
+        {
+            pstate->error_msg = eof_or_error_msg(pstate->stream);
+        }
+    }
+}
+
+// Define function to validate number is same as fixed value after parse
+
+void
+parse_validate_fixed(bool same, const char *element, PState *pstate)
+{
+    if (!pstate->error_msg && !same)
+    {
+        // Error message would be easier to assemble and
+        // internationalize if we used an error struct with multiple
+        // fields instead of a const char string.
+        pstate->error_msg = "Parse: Value of element does not match value of "
+                            "its 'fixed' attribute";

Review comment:
       We already planned to return Error struct objects instead of const char* string literals because an Error struct will make it easier to compose different data together and internationalize error messages.  We could store validation errors (maybe accumulate them up to a static fixed array's limit) in Error member field(s) even though that would mean we would need to distinguish whether an Error struct object has no errors or has parser errors, runtime SDEs, and/or validation errors.  Or we could store / gather up validation errors in a separate PState/UState member field rather than storing them in the Error struct itself so that parser code still can have `if (!pstate->error)` blocks.  Thoughts?

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -27,6 +28,38 @@
 
 // Macros to define unparse_<endian>_<type> functions
 
+#define define_unparse_endian_bool(endian, bits)                               \
+    void unparse_##endian##_bool##bits(bool number, int64_t true_rep,          \
+                                       uint32_t false_rep, UState *ustate)     \
+    {                                                                          \
+        if (!ustate->error_msg)                                                \
+        {                                                                      \
+            union                                                              \
+            {                                                                  \
+                char           c_val[sizeof(uint##bits##_t)];                  \
+                uint##bits##_t i_val;                                          \
+            } buffer;                                                          \
+            if (number)                                                        \
+            {                                                                  \
+                buffer.i_val =                                                 \
+                    (true_rep < 0) ? ~false_rep : (uint32_t)true_rep;          \
+            }                                                                  \
+            else                                                               \
+            {                                                                  \
+                buffer.i_val = false_rep;                                      \
+            }                                                                  \
+            buffer.i_val = hto##endian##bits(buffer.i_val);                    \
+            size_t count =                                                     \
+                fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);       \
+                                                                               \
+            ustate->position += count;                                         \
+            if (count < sizeof(buffer))                                        \
+            {                                                                  \
+                ustate->error_msg = eof_or_error_msg(ustate->stream);          \

Review comment:
       Let's simply change the error messages in eof_or_error_msg to say "Found eof indicator in stream, stopping now" or "Found error indicator in stream, stopping now".  Whether the error is happening during parsing or unparsing may be obvious from the daffodil command or TDML test name anyway.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -42,6 +75,8 @@
             buffer.i_val = hto##endian##bits(buffer.i_val);                    \
             size_t count =                                                     \
                 fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);       \
+                                                                               \
+            ustate->position += count;                                         \

Review comment:
       Good idea, just created helper macros `read_stream_update_position` in parsers.c and `write_stream_update_position` in unparsers.c to reduce duplication of that C code.  A helper function wouldn't have worked because callers would have needed to call multiple helper functions with different function signatures instead of calling a single helper macro.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryIntegerKnownLengthCodeGenerator.scala
##########
@@ -46,23 +46,32 @@ trait BinaryIntegerKnownLengthCodeGenerator {
       case 16 => "0xCCCC"
       case 32 => "0xCCCCCCCC"
       case 64 => "0xCCCCCCCCCCCCCCCC"
-      case _ => e.SDE("Lengths other than 8, 16, 32, or 64 bits are not supported.")
+      case _ => e.SDE("Integer lengths other than 8, 16, 32, or 64 bits are not supported.")
     }
     val fieldName = e.namedQName.local
-    val integer = if (g.signed) s"int${lengthInBits}" else s"uint${lengthInBits}"
     val conv = if (byteOrder eq ByteOrder.BigEndian) "be" else "le"
+    val prim = if (g.signed) s"int${lengthInBits}" else s"uint${lengthInBits}"
     val arraySize = if (e.occursCountKind == OccursCountKind.Fixed) e.maxOccurs else 0
+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""
 
-    def addSimpleTypeStatements(deref: String): Unit = {
+    def addStatements(deref: String): Unit = {
       val initStatement = s"    instance->$fieldName$deref = $initialValue;"
-      val parseStatement = s"    parse_${conv}_$integer(&instance->$fieldName$deref, pstate);"
-      val unparseStatement = s"    unparse_${conv}_$integer(instance->$fieldName$deref, ustate);"
+      val parseStatement = s"    parse_${conv}_$prim(&instance->$fieldName$deref, pstate);"
+      val unparseStatement = s"    unparse_${conv}_$prim(instance->$fieldName$deref, ustate);"
       cgState.addSimpleTypeStatements(initStatement, parseStatement, unparseStatement)
+
+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 = s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);"""
+        val unparse2 = s"""    unparse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", ustate);"""
+        cgState.addSimpleTypeStatements(init2, parse2, unparse2)
+      }
     }
     if (arraySize > 0)
       for (i <- 0 until arraySize)
-        addSimpleTypeStatements(s"[$i]")
+        addStatements(s"[$i]")
     else
-      addSimpleTypeStatements("")
+      addStatements("")
   }

Review comment:
       Yes, I deliberately made the code look as similar as possible so refactoring out the duplicate code later would be easier to do.  

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -63,51 +98,81 @@
             buffer.i_val = hto##endian##bits(number);                          \
             size_t count =                                                     \
                 fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);       \
+                                                                               \
+            ustate->position += count;                                         \
             if (count < sizeof(buffer))                                        \
             {                                                                  \
                 ustate->error_msg = eof_or_error_msg(ustate->stream);          \
             }                                                                  \
         }                                                                      \
     }
 
-// Define functions to unparse binary real and integer numbers
+// Define functions to unparse binary real numbers and integers
 
-define_unparse_endian_real(be, double, 64)
+define_unparse_endian_bool(be, 16);
+define_unparse_endian_bool(be, 32);
+define_unparse_endian_bool(be, 8);
 
+define_unparse_endian_real(be, double, 64)
 define_unparse_endian_real(be, float, 32)
 
-define_unparse_endian_integer(be, uint, 64)
-
-define_unparse_endian_integer(be, uint, 32)
+define_unparse_endian_integer(be, int, 16)
+define_unparse_endian_integer(be, int, 32)
+define_unparse_endian_integer(be, int, 64)
+define_unparse_endian_integer(be, int, 8)
 
 define_unparse_endian_integer(be, uint, 16)
-
+define_unparse_endian_integer(be, uint, 32)
+define_unparse_endian_integer(be, uint, 64)
 define_unparse_endian_integer(be, uint, 8)
 
-define_unparse_endian_integer(be, int, 64)
-
-define_unparse_endian_integer(be, int, 32)
-
-define_unparse_endian_integer(be, int, 16)
-
-define_unparse_endian_integer(be, int, 8)
+define_unparse_endian_bool(le, 16);
+define_unparse_endian_bool(le, 32);
+define_unparse_endian_bool(le, 8);
 
 define_unparse_endian_real(le, double, 64)
-
 define_unparse_endian_real(le, float, 32)
 
-define_unparse_endian_integer(le, uint, 64)
-
-define_unparse_endian_integer(le, uint, 32)
+define_unparse_endian_integer(le, int, 16)
+define_unparse_endian_integer(le, int, 32)
+define_unparse_endian_integer(le, int, 64)
+define_unparse_endian_integer(le, int, 8)
 
 define_unparse_endian_integer(le, uint, 16)
-
+define_unparse_endian_integer(le, uint, 32)
+define_unparse_endian_integer(le, uint, 64)
 define_unparse_endian_integer(le, uint, 8)
 
-define_unparse_endian_integer(le, int, 64)
-
-define_unparse_endian_integer(le, int, 32)
-
-define_unparse_endian_integer(le, int, 16)
-
-define_unparse_endian_integer(le, int, 8)
+// Define function to unparse fill bytes until end position is reached
+
+void
+unparse_fill_bytes(size_t end_position, UState *ustate)
+{
+    const char buffer = ustate->fill_byte;
+    while (!ustate->error_msg && ustate->position < end_position)
+    {
+        size_t count = fwrite(&buffer, 1, sizeof(buffer), ustate->stream);
+
+        ustate->position += count;
+        if (count < sizeof(buffer))
+        {
+            ustate->error_msg = eof_or_error_msg(ustate->stream);
+        }
+    }
+}

Review comment:
       Right, considered using a larger buffer when I wrote original code, decided to keep logic simple and optimize later if issue comes up.  Figured we might need to change parts of runtime2 code like this several times before they settle down anyway.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.schema.annotation.props.gen.BitOrder
+import org.apache.daffodil.schema.annotation.props.gen.ByteOrder
+import org.apache.daffodil.schema.annotation.props.gen.OccursCountKind
+import passera.unsigned.ULong
+
+trait BinaryBooleanCodeGenerator {
+
+  def binaryBooleanGenerateCode(e: ElementBase, cgState: CodeGeneratorState): Unit = {
+    // For the time being this is a very limited back end.
+    // So there are some restrictions to enforce.
+    e.schemaDefinitionUnless(e.bitOrder eq BitOrder.MostSignificantBitFirst, "Only dfdl:bitOrder 'mostSignificantBitFirst' is supported.")
+    val byteOrder: ByteOrder = {
+      e.schemaDefinitionUnless(e.byteOrderEv.isConstant, "Runtime dfdl:byteOrder expressions not supported.")
+      e.byteOrderEv.constValue
+    }
+    val lengthInBits: Long = {
+      e.schemaDefinitionUnless(e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions not supported.")
+      val len = e.elementLengthInBitsEv.constValue.get
+      len match {
+        case 8 | 16 | 32 => len
+        case _ => e.SDE("Boolean lengths other than 8, 16, or 32 bits are not supported.")
+      }
+    }
+    Assert.invariant(e.binaryBooleanTrueRep.isEmpty || e.binaryBooleanTrueRep.getULong >= ULong(0))
+    Assert.invariant(e.binaryBooleanFalseRep >= ULong(0))
+
+    val initialValue = "true"
+    val fieldName = e.namedQName.local
+    val conv = if (byteOrder eq ByteOrder.BigEndian) "be" else "le"
+    val prim = s"bool$lengthInBits"
+    val trueRep = if (e.binaryBooleanTrueRep.isDefined) e.binaryBooleanTrueRep.getULong else -1
+    val falseRep = e.binaryBooleanFalseRep
+    val arraySize = if (e.occursCountKind == OccursCountKind.Fixed) e.maxOccurs else 0
+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""

Review comment:
       Is a DSOM getter significantly more efficient than `e.xml.attribute("fixed")` in a hot path?  Code generator isn't called frequently except in TDML tests, so we can put off the DSOM changes until we need fixed support in runtime1.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -316,41 +354,55 @@ class CodeGeneratorState {
   }
 
   def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
-    structs.top.initStatements += initStatement
-    structs.top.parserStatements += parseStatement
-    structs.top.unparserStatements += unparseStatement
+    if (initStatement.nonEmpty) structs.top.initStatements += initStatement
+    if (parseStatement.nonEmpty) structs.top.parserStatements += parseStatement
+    if (unparseStatement.nonEmpty) structs.top.unparserStatements += unparseStatement
   }

Review comment:
       To my knowledge, calling += with an empty string on a mutable.ArrayBuffer[String] would not be a no-op - it would literally add the empty string to the array and result in an empty line followed by a newline when calling mkString("\n").

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.schema.annotation.props.gen.BitOrder
+import org.apache.daffodil.schema.annotation.props.gen.ByteOrder
+import org.apache.daffodil.schema.annotation.props.gen.OccursCountKind
+import passera.unsigned.ULong
+
+trait BinaryBooleanCodeGenerator {
+
+  def binaryBooleanGenerateCode(e: ElementBase, cgState: CodeGeneratorState): Unit = {
+    // For the time being this is a very limited back end.
+    // So there are some restrictions to enforce.
+    e.schemaDefinitionUnless(e.bitOrder eq BitOrder.MostSignificantBitFirst, "Only dfdl:bitOrder 'mostSignificantBitFirst' is supported.")
+    val byteOrder: ByteOrder = {
+      e.schemaDefinitionUnless(e.byteOrderEv.isConstant, "Runtime dfdl:byteOrder expressions not supported.")
+      e.byteOrderEv.constValue
+    }
+    val lengthInBits: Long = {
+      e.schemaDefinitionUnless(e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions not supported.")
+      val len = e.elementLengthInBitsEv.constValue.get
+      len match {
+        case 8 | 16 | 32 => len
+        case _ => e.SDE("Boolean lengths other than 8, 16, or 32 bits are not supported.")
+      }
+    }
+    Assert.invariant(e.binaryBooleanTrueRep.isEmpty || e.binaryBooleanTrueRep.getULong >= ULong(0))
+    Assert.invariant(e.binaryBooleanFalseRep >= ULong(0))
+
+    val initialValue = "true"
+    val fieldName = e.namedQName.local
+    val conv = if (byteOrder eq ByteOrder.BigEndian) "be" else "le"
+    val prim = s"bool$lengthInBits"
+    val trueRep = if (e.binaryBooleanTrueRep.isDefined) e.binaryBooleanTrueRep.getULong else -1
+    val falseRep = e.binaryBooleanFalseRep
+    val arraySize = if (e.occursCountKind == OccursCountKind.Fixed) e.maxOccurs else 0
+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""
+
+    def addStatements(deref: String): Unit = {
+      val initStatement = s"    instance->$fieldName$deref = $initialValue;"
+      val parseStatement = s"    parse_${conv}_$prim(&instance->$fieldName$deref, $trueRep, $falseRep, pstate);"
+      val unparseStatement = s"    unparse_${conv}_$prim(instance->$fieldName$deref, $trueRep, $falseRep, ustate);"
+      cgState.addSimpleTypeStatements(initStatement, parseStatement, unparseStatement)
+
+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 = s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);"""
+        val unparse2 = s"""    unparse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", ustate);"""
+        cgState.addSimpleTypeStatements(init2, parse2, unparse2)
+      }
+    }
+    if (arraySize > 0)
+      for (i <- 0 until arraySize)
+        addStatements(s"[$i]")
+    else
+      addStatements("")
+  }

Review comment:
       Yes, runtime2 supports only dfdl:occursCountKind 'fixed' in which case I know minOccurs must equal maxOccurs and that many occurrences are always expected.  If you look at line 54, arraySize always gets set to zero in any other situation.  I'm not sure where the best place to add a SDE would be, but once I start feeding some of Daffodil's TDML tests to runtime2, I should be able to see a place to add a SDE when needed.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -262,6 +288,18 @@ class CodeGeneratorState {
       structs.top.parserStatements += parseStatement
       structs.top.unparserStatements += unparseStatement
     }
+
+    // Implement padding if complex type has an explicit length
+    if (context.maybeFixedLengthInBits.isDefined && context.maybeFixedLengthInBits.get > 0) {
+      val octalFillByte = context.fillByteEv.constValue.toByte.toOctalString
+      val parseStatement = s"    parse_fill_bytes(end_position, pstate);"
+      val unparseStatement =
+        s"""    ustate->fill_byte = '\\$octalFillByte';
+           |    unparse_fill_bytes(end_position, ustate);""".stripMargin
+

Review comment:
       I knew runtime1 had put fill byte in the state so I thought runtime2 had to do it that way.  Now I'm passing the fill byte to unparse_fill_bytes thanks to your suggestion.  The fill byte doesn't need to be saved away in the ERD since nothing else uses it at this time.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -362,21 +414,72 @@ class CodeGeneratorState {
     structs.pop()
   }
 
+  // Gets length from explicit length declaration if any, otherwise from base type's implicit length
+  private def getLengthInBits(e: ElementBase): Long = {
+    e.schemaDefinitionUnless(e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions are not supported.")
+    e.elementLengthInBitsEv.constValue.get
+  }
+
+  // Because schema authors don't always get types right, allows explicit lengths to override implicit lengths

Review comment:
       Hmm, I'm making the assumption in runtime2 that the generated structs and the individual fields being parsed/unparsed should have the same types (`int32_t` in both places) if the schema calls for a 4-byte integer field.  If you really think someone may want different types in the generated structs than in the data, that is going to make the code generator's job more difficult (C compiler warnings or errors might be generated).  I've added a warning as you suggested.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -67,8 +66,21 @@ class CodeGeneratorState {
     val C = localName(context)
     val initStatements = structs.top.initStatements.mkString("\n")
     val initChoiceStatements = structs.top.initChoiceStatements.mkString("\n")
-    val parserStatements = structs.top.parserStatements.mkString("\n")
-    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val hasStatements = structs.top.parserStatements.nonEmpty
+    val parserStatements = if (hasStatements)
+      structs.top.parserStatements.mkString("\n")
+    else
+      s"""    // Empty struct, but need to prevent compiler warnings
+         |    (void)${C}_compute_offsets;
+         |    (void)instance;
+         |    (void)pstate;""".stripMargin
+    val unparserStatements = if (hasStatements)

Review comment:
       No need to check count of unparseStatements itself since runtime2 code generator has been generating both parseStatements and unparseStatements in every place so far (it's almost an invariant).  Don't want to generate empty string and then replace empty string with empty struct statements.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org