You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2021/03/01 20:04:38 UTC

[GitHub] [daffodil] stevedlawrence commented on a change in pull request #494: Support booleans, arrays of complex types, fixed values, fill bytes

stevedlawrence commented on a change in pull request #494:
URL: https://github.com/apache/daffodil/pull/494#discussion_r584941586



##########
File path: daffodil-runtime2/src/main/resources/c/libcli/xml_reader.c
##########
@@ -123,6 +151,10 @@ strtofnum(const char *numptr, const char **errstrp)
     {
         *errstrp = "Found non-number characters in XML data";
     }
+    else if (value < minval || value > maxval || maxval < minval)

Review comment:
       Does runtime2 have a concept of an invariant/assertions? Seems like the case of ``maxval < minval`` should be considered an assertion that implies there's a bug somewhere, rather than something like an SDE/unparser error (not sure how errstrp is used, but I assume it's used in error messages lik those?).

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/parsers.c
##########
@@ -69,44 +113,72 @@
         }                                                                      \
     }
 
-// Define functions to parse binary real and integer numbers
+// Define functions to parse binary real numbers and integers
 
-define_parse_endian_real(be, double, 64)
+define_parse_endian_bool(be, 16);
+define_parse_endian_bool(be, 32);
+define_parse_endian_bool(be, 8);
 
+define_parse_endian_real(be, double, 64)
 define_parse_endian_real(be, float, 32)
 
-define_parse_endian_integer(be, uint, 64)
-
-define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, int, 16)
+define_parse_endian_integer(be, int, 32)
+define_parse_endian_integer(be, int, 64)
+define_parse_endian_integer(be, int, 8)
 
 define_parse_endian_integer(be, uint, 16)
-
+define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, uint, 64)
 define_parse_endian_integer(be, uint, 8)
 
-define_parse_endian_integer(be, int, 64)
-
-define_parse_endian_integer(be, int, 32)
-
-define_parse_endian_integer(be, int, 16)
-
-define_parse_endian_integer(be, int, 8)
+define_parse_endian_bool(le, 16);
+define_parse_endian_bool(le, 32);
+define_parse_endian_bool(le, 8);
 
 define_parse_endian_real(le, double, 64)
-
 define_parse_endian_real(le, float, 32)
 
-define_parse_endian_integer(le, uint, 64)
-
-define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, int, 16)
+define_parse_endian_integer(le, int, 32)
+define_parse_endian_integer(le, int, 64)
+define_parse_endian_integer(le, int, 8)
 
 define_parse_endian_integer(le, uint, 16)
-
+define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, uint, 64)
 define_parse_endian_integer(le, uint, 8)
 
-define_parse_endian_integer(le, int, 64)
-
-define_parse_endian_integer(le, int, 32)
-
-define_parse_endian_integer(le, int, 16)
-
-define_parse_endian_integer(le, int, 8)
+// Define function to parse fill bytes until end position is reached
+
+void
+parse_fill_bytes(size_t end_position, PState *pstate)
+{
+    while (!pstate->error_msg && pstate->position < end_position)
+    {
+        char   buffer;
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+
+        pstate->position += count;
+        if (count < sizeof(buffer))
+        {
+            pstate->error_msg = eof_or_error_msg(pstate->stream);
+        }
+    }
+}
+
+// Define function to validate number is same as fixed value after parse
+
+void
+parse_validate_fixed(bool same, const char *element, PState *pstate)
+{
+    if (!pstate->error_msg && !same)
+    {
+        // Error message would be easier to assemble and
+        // internationalize if we used an error struct with multiple
+        // fields instead of a const char string.
+        pstate->error_msg = "Parse: Value of element does not match value of "
+                            "its 'fixed' attribute";

Review comment:
       Daffodil treats parser errors and validation errors as something different. Does this runtime essentially treat all such errors as fatal, (e.g. no backtracking caused by a parser error, no gathering up validation errors)?

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2CodeGenerator.scala
##########
@@ -58,12 +61,13 @@ object Runtime2CodeGenerator
       case g: ElementCombinator => Runtime2CodeGenerator.generateCode(g.subComb, state)
       case g: SpecifiedLengthImplicit => Runtime2CodeGenerator.generateCode(g.eGram, state)
       case g: ScalarOrderedSequenceChild => Runtime2CodeGenerator.generateCode(g.term.termContentBody, state)
-      case g: SeqComp => seqCompGenerateCode(g, state)
-      case g: OrderedSequence => orderedSequenceGenerateCode(g, state)
-      case g: ElementParseAndUnspecifiedLength => elementParseAndUnspecifiedLengthGenerateCode(g, state)
-      case g: BinaryIntegerKnownLength => binaryIntegerKnownLengthGenerateCode(g, state)
-      case g: BinaryFloat => binaryFloatGenerateCode(g.e, 32, state)
+      case g: BinaryBoolean => binaryBooleanGenerateCode(g.e, state)
       case g: BinaryDouble => binaryFloatGenerateCode(g.e, 64, state)
+      case g: BinaryFloat => binaryFloatGenerateCode(g.e, 32, state)
+      case g: BinaryIntegerKnownLength => binaryIntegerKnownLengthGenerateCode(g, state)
+      case g: ElementParseAndUnspecifiedLength => elementParseAndUnspecifiedLengthGenerateCode(g, state)
+      case g: OrderedSequence => orderedSequenceGenerateCode(g, state)

Review comment:
       I'm surprised code cov says this line isn't covered by tests, considering you only support ordered sequences...

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.schema.annotation.props.gen.BitOrder
+import org.apache.daffodil.schema.annotation.props.gen.ByteOrder
+import org.apache.daffodil.schema.annotation.props.gen.OccursCountKind
+import passera.unsigned.ULong
+
+trait BinaryBooleanCodeGenerator {
+
+  def binaryBooleanGenerateCode(e: ElementBase, cgState: CodeGeneratorState): Unit = {
+    // For the time being this is a very limited back end.
+    // So there are some restrictions to enforce.
+    e.schemaDefinitionUnless(e.bitOrder eq BitOrder.MostSignificantBitFirst, "Only dfdl:bitOrder 'mostSignificantBitFirst' is supported.")
+    val byteOrder: ByteOrder = {
+      e.schemaDefinitionUnless(e.byteOrderEv.isConstant, "Runtime dfdl:byteOrder expressions not supported.")
+      e.byteOrderEv.constValue
+    }
+    val lengthInBits: Long = {
+      e.schemaDefinitionUnless(e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions not supported.")
+      val len = e.elementLengthInBitsEv.constValue.get
+      len match {
+        case 8 | 16 | 32 => len
+        case _ => e.SDE("Boolean lengths other than 8, 16, or 32 bits are not supported.")
+      }
+    }
+    Assert.invariant(e.binaryBooleanTrueRep.isEmpty || e.binaryBooleanTrueRep.getULong >= ULong(0))
+    Assert.invariant(e.binaryBooleanFalseRep >= ULong(0))
+
+    val initialValue = "true"
+    val fieldName = e.namedQName.local
+    val conv = if (byteOrder eq ByteOrder.BigEndian) "be" else "le"
+    val prim = s"bool$lengthInBits"
+    val trueRep = if (e.binaryBooleanTrueRep.isDefined) e.binaryBooleanTrueRep.getULong else -1
+    val falseRep = e.binaryBooleanFalseRep
+    val arraySize = if (e.occursCountKind == OccursCountKind.Fixed) e.maxOccurs else 0
+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""
+
+    def addStatements(deref: String): Unit = {
+      val initStatement = s"    instance->$fieldName$deref = $initialValue;"
+      val parseStatement = s"    parse_${conv}_$prim(&instance->$fieldName$deref, $trueRep, $falseRep, pstate);"
+      val unparseStatement = s"    unparse_${conv}_$prim(instance->$fieldName$deref, $trueRep, $falseRep, ustate);"
+      cgState.addSimpleTypeStatements(initStatement, parseStatement, unparseStatement)
+
+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 = s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);"""
+        val unparse2 = s"""    unparse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", ustate);"""
+        cgState.addSimpleTypeStatements(init2, parse2, unparse2)
+      }
+    }
+    if (arraySize > 0)
+      for (i <- 0 until arraySize)
+        addStatements(s"[$i]")
+    else
+      addStatements("")
+  }

Review comment:
       Just curious, does runtime2 currently only support arrays with fixed occurres count size? Might be worth adding an SDE if that's the case for non fixed occurrances. 

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/parsers.c
##########
@@ -69,44 +113,72 @@
         }                                                                      \
     }
 
-// Define functions to parse binary real and integer numbers
+// Define functions to parse binary real numbers and integers
 
-define_parse_endian_real(be, double, 64)
+define_parse_endian_bool(be, 16);
+define_parse_endian_bool(be, 32);
+define_parse_endian_bool(be, 8);
 
+define_parse_endian_real(be, double, 64)
 define_parse_endian_real(be, float, 32)
 
-define_parse_endian_integer(be, uint, 64)
-
-define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, int, 16)
+define_parse_endian_integer(be, int, 32)
+define_parse_endian_integer(be, int, 64)
+define_parse_endian_integer(be, int, 8)
 
 define_parse_endian_integer(be, uint, 16)
-
+define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, uint, 64)
 define_parse_endian_integer(be, uint, 8)
 
-define_parse_endian_integer(be, int, 64)
-
-define_parse_endian_integer(be, int, 32)
-
-define_parse_endian_integer(be, int, 16)
-
-define_parse_endian_integer(be, int, 8)
+define_parse_endian_bool(le, 16);
+define_parse_endian_bool(le, 32);
+define_parse_endian_bool(le, 8);
 
 define_parse_endian_real(le, double, 64)
-
 define_parse_endian_real(le, float, 32)
 
-define_parse_endian_integer(le, uint, 64)
-
-define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, int, 16)
+define_parse_endian_integer(le, int, 32)
+define_parse_endian_integer(le, int, 64)
+define_parse_endian_integer(le, int, 8)
 
 define_parse_endian_integer(le, uint, 16)
-
+define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, uint, 64)
 define_parse_endian_integer(le, uint, 8)
 
-define_parse_endian_integer(le, int, 64)
-
-define_parse_endian_integer(le, int, 32)
-
-define_parse_endian_integer(le, int, 16)
-
-define_parse_endian_integer(le, int, 8)
+// Define function to parse fill bytes until end position is reached
+
+void
+parse_fill_bytes(size_t end_position, PState *pstate)
+{
+    while (!pstate->error_msg && pstate->position < end_position)
+    {
+        char   buffer;
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+
+        pstate->position += count;
+        if (count < sizeof(buffer))
+        {
+            pstate->error_msg = eof_or_error_msg(pstate->stream);
+        }
+    }
+}
+
+// Define function to validate number is same as fixed value after parse
+
+void
+parse_validate_fixed(bool same, const char *element, PState *pstate)
+{
+    if (!pstate->error_msg && !same)
+    {
+        // Error message would be easier to assemble and
+        // internationalize if we used an error struct with multiple
+        // fields instead of a const char string.
+        pstate->error_msg = "Parse: Value of element does not match value of "
+                            "its 'fixed' attribute";
+        (void)element; // unused because managing strings hard in embedded C

Review comment:
       Can `` __attribute__ ((unused))`` be used or are there portability issues with that? I've heard of people creating an UNUSED macro that does essentialy what you do but is a bit more self documenting.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -63,51 +98,81 @@
             buffer.i_val = hto##endian##bits(number);                          \
             size_t count =                                                     \
                 fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);       \
+                                                                               \
+            ustate->position += count;                                         \
             if (count < sizeof(buffer))                                        \
             {                                                                  \
                 ustate->error_msg = eof_or_error_msg(ustate->stream);          \
             }                                                                  \
         }                                                                      \
     }
 
-// Define functions to unparse binary real and integer numbers
+// Define functions to unparse binary real numbers and integers
 
-define_unparse_endian_real(be, double, 64)
+define_unparse_endian_bool(be, 16);
+define_unparse_endian_bool(be, 32);
+define_unparse_endian_bool(be, 8);
 
+define_unparse_endian_real(be, double, 64)
 define_unparse_endian_real(be, float, 32)
 
-define_unparse_endian_integer(be, uint, 64)
-
-define_unparse_endian_integer(be, uint, 32)
+define_unparse_endian_integer(be, int, 16)
+define_unparse_endian_integer(be, int, 32)
+define_unparse_endian_integer(be, int, 64)
+define_unparse_endian_integer(be, int, 8)
 
 define_unparse_endian_integer(be, uint, 16)
-
+define_unparse_endian_integer(be, uint, 32)
+define_unparse_endian_integer(be, uint, 64)
 define_unparse_endian_integer(be, uint, 8)
 
-define_unparse_endian_integer(be, int, 64)
-
-define_unparse_endian_integer(be, int, 32)
-
-define_unparse_endian_integer(be, int, 16)
-
-define_unparse_endian_integer(be, int, 8)
+define_unparse_endian_bool(le, 16);
+define_unparse_endian_bool(le, 32);
+define_unparse_endian_bool(le, 8);
 
 define_unparse_endian_real(le, double, 64)
-
 define_unparse_endian_real(le, float, 32)
 
-define_unparse_endian_integer(le, uint, 64)
-
-define_unparse_endian_integer(le, uint, 32)
+define_unparse_endian_integer(le, int, 16)
+define_unparse_endian_integer(le, int, 32)
+define_unparse_endian_integer(le, int, 64)
+define_unparse_endian_integer(le, int, 8)
 
 define_unparse_endian_integer(le, uint, 16)
-
+define_unparse_endian_integer(le, uint, 32)
+define_unparse_endian_integer(le, uint, 64)
 define_unparse_endian_integer(le, uint, 8)
 
-define_unparse_endian_integer(le, int, 64)
-
-define_unparse_endian_integer(le, int, 32)
-
-define_unparse_endian_integer(le, int, 16)
-
-define_unparse_endian_integer(le, int, 8)
+// Define function to unparse fill bytes until end position is reached
+
+void
+unparse_fill_bytes(size_t end_position, UState *ustate)
+{
+    const char buffer = ustate->fill_byte;
+    while (!ustate->error_msg && ustate->position < end_position)
+    {
+        size_t count = fwrite(&buffer, 1, sizeof(buffer), ustate->stream);
+
+        ustate->position += count;
+        if (count < sizeof(buffer))
+        {
+            ustate->error_msg = eof_or_error_msg(ustate->stream);
+        }
+    }
+}

Review comment:
       Not sure it's worth doing unless profiling show issues here, but the parse/unparse_fill_bytes functions reading/writing a single byte in a loop might slow performance. Might be worth considering having a larger buffer that is used for these for read/writes, especially if you're expecting lots of skipped bytes.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -42,6 +75,8 @@
             buffer.i_val = hto##endian##bits(buffer.i_val);                    \
             size_t count =                                                     \
                 fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);       \
+                                                                               \
+            ustate->position += count;                                         \

Review comment:
       Lots of duplication related to fread/fwrite to actually do the read/right, update the position, check for errors. etc. Maybe add a helper function that does all that handles all that logic for us?

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/infoset.h
##########
@@ -108,6 +110,8 @@ typedef struct PState
 typedef struct UState
 {
     FILE *      stream;    // output to write data to
+    size_t      position;  // 0-based position in stream
+    char        fill_byte; // fill byte to write when needed

Review comment:
       Note that dfdl:fillByte can be different from element to element. It may make sense for this to be part of the ERD rather than the UState.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryBooleanCodeGenerator.scala
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.Assert
+import org.apache.daffodil.schema.annotation.props.gen.BitOrder
+import org.apache.daffodil.schema.annotation.props.gen.ByteOrder
+import org.apache.daffodil.schema.annotation.props.gen.OccursCountKind
+import passera.unsigned.ULong
+
+trait BinaryBooleanCodeGenerator {
+
+  def binaryBooleanGenerateCode(e: ElementBase, cgState: CodeGeneratorState): Unit = {
+    // For the time being this is a very limited back end.
+    // So there are some restrictions to enforce.
+    e.schemaDefinitionUnless(e.bitOrder eq BitOrder.MostSignificantBitFirst, "Only dfdl:bitOrder 'mostSignificantBitFirst' is supported.")
+    val byteOrder: ByteOrder = {
+      e.schemaDefinitionUnless(e.byteOrderEv.isConstant, "Runtime dfdl:byteOrder expressions not supported.")
+      e.byteOrderEv.constValue
+    }
+    val lengthInBits: Long = {
+      e.schemaDefinitionUnless(e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions not supported.")
+      val len = e.elementLengthInBitsEv.constValue.get
+      len match {
+        case 8 | 16 | 32 => len
+        case _ => e.SDE("Boolean lengths other than 8, 16, or 32 bits are not supported.")
+      }
+    }
+    Assert.invariant(e.binaryBooleanTrueRep.isEmpty || e.binaryBooleanTrueRep.getULong >= ULong(0))
+    Assert.invariant(e.binaryBooleanFalseRep >= ULong(0))
+
+    val initialValue = "true"
+    val fieldName = e.namedQName.local
+    val conv = if (byteOrder eq ByteOrder.BigEndian) "be" else "le"
+    val prim = s"bool$lengthInBits"
+    val trueRep = if (e.binaryBooleanTrueRep.isDefined) e.binaryBooleanTrueRep.getULong else -1
+    val falseRep = e.binaryBooleanFalseRep
+    val arraySize = if (e.occursCountKind == OccursCountKind.Fixed) e.maxOccurs else 0
+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""

Review comment:
       Would be nice if fixed support was added to DSOM so we don't have to duplicat this logic. We'll need this in runtime1 at some point.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/parsers.c
##########
@@ -69,44 +113,72 @@
         }                                                                      \
     }
 
-// Define functions to parse binary real and integer numbers
+// Define functions to parse binary real numbers and integers
 
-define_parse_endian_real(be, double, 64)
+define_parse_endian_bool(be, 16);
+define_parse_endian_bool(be, 32);
+define_parse_endian_bool(be, 8);
 
+define_parse_endian_real(be, double, 64)
 define_parse_endian_real(be, float, 32)
 
-define_parse_endian_integer(be, uint, 64)
-
-define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, int, 16)
+define_parse_endian_integer(be, int, 32)
+define_parse_endian_integer(be, int, 64)
+define_parse_endian_integer(be, int, 8)
 
 define_parse_endian_integer(be, uint, 16)
-
+define_parse_endian_integer(be, uint, 32)
+define_parse_endian_integer(be, uint, 64)
 define_parse_endian_integer(be, uint, 8)
 
-define_parse_endian_integer(be, int, 64)
-
-define_parse_endian_integer(be, int, 32)
-
-define_parse_endian_integer(be, int, 16)
-
-define_parse_endian_integer(be, int, 8)
+define_parse_endian_bool(le, 16);
+define_parse_endian_bool(le, 32);
+define_parse_endian_bool(le, 8);
 
 define_parse_endian_real(le, double, 64)
-
 define_parse_endian_real(le, float, 32)
 
-define_parse_endian_integer(le, uint, 64)
-
-define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, int, 16)
+define_parse_endian_integer(le, int, 32)
+define_parse_endian_integer(le, int, 64)
+define_parse_endian_integer(le, int, 8)
 
 define_parse_endian_integer(le, uint, 16)
-
+define_parse_endian_integer(le, uint, 32)
+define_parse_endian_integer(le, uint, 64)
 define_parse_endian_integer(le, uint, 8)
 
-define_parse_endian_integer(le, int, 64)
-
-define_parse_endian_integer(le, int, 32)
-
-define_parse_endian_integer(le, int, 16)
-
-define_parse_endian_integer(le, int, 8)
+// Define function to parse fill bytes until end position is reached
+
+void
+parse_fill_bytes(size_t end_position, PState *pstate)
+{
+    while (!pstate->error_msg && pstate->position < end_position)
+    {
+        char   buffer;
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+
+        pstate->position += count;
+        if (count < sizeof(buffer))
+        {
+            pstate->error_msg = eof_or_error_msg(pstate->stream);
+        }
+    }
+}
+
+// Define function to validate number is same as fixed value after parse
+
+void
+parse_validate_fixed(bool same, const char *element, PState *pstate)
+{
+    if (!pstate->error_msg && !same)

Review comment:
       This is sort of a design issue, but it seems many functions are essentially wrapped in a ``if (!pstate->error_msg) { ... }`` block. Has there been any thought to changing these functions so that they assume they are only going to be called if there isn't already an error? I assume this would mean generated code would then look more like:
   
   ```c
   foo = parse_do_something(..., pstate)
   if (pstate->error_msg)
   {
      // handle error
      return;
   }
   ```
   But that seems like the right behavior. I would execpt callers of functions to handle any errors returned by that function.

##########
File path: daffodil-runtime2/src/main/resources/c/libcli/xml_reader.c
##########
@@ -292,48 +328,50 @@ xmlNumberElem(XMLReader *reader, const ERD *erd, void *numLocation)
             // Check for any errors getting the number
             const char *errstr = NULL;
 
-            // Handle varying bit lengths of both signed & unsigned integers and
-            // floating point numbers
+            // Handle varying bit lengths of both signed & unsigned numbers
             const enum TypeCode typeCode = erd->typeCode;
             switch (typeCode)
             {
-            case PRIMITIVE_UINT64:
-                *(uint64_t *)numLocation =
-                    (uint64_t)strtounum(number_from_xml, UINT64_MAX, &errstr);
+            case PRIMITIVE_BOOLEAN:
+                *(bool *)numLocation = strtobool(number_from_xml, &errstr);
                 break;
-            case PRIMITIVE_UINT32:
-                *(uint32_t *)numLocation =
-                    (uint32_t)strtounum(number_from_xml, UINT32_MAX, &errstr);
-                break;
-            case PRIMITIVE_UINT16:
-                *(uint16_t *)numLocation =
-                    (uint16_t)strtounum(number_from_xml, UINT16_MAX, &errstr);
+            case PRIMITIVE_FLOAT:
+                *(float *)numLocation = strtofnum(number_from_xml, &errstr);
                 break;
-            case PRIMITIVE_UINT8:
-                *(uint8_t *)numLocation =
-                    (uint8_t)strtounum(number_from_xml, UINT8_MAX, &errstr);
+            case PRIMITIVE_DOUBLE:
+                *(double *)numLocation = strtodnum(number_from_xml, &errstr);
                 break;
-            case PRIMITIVE_INT64:
-                *(int64_t *)numLocation = (int64_t)strtonum(
-                    number_from_xml, INT64_MIN, INT64_MAX, &errstr);
+            case PRIMITIVE_INT16:
+                *(int16_t *)numLocation = (int16_t)strtonum(
+                    number_from_xml, INT16_MIN, INT16_MAX, &errstr);
                 break;
             case PRIMITIVE_INT32:
                 *(int32_t *)numLocation = (int32_t)strtonum(
                     number_from_xml, INT32_MIN, INT32_MAX, &errstr);
                 break;
-            case PRIMITIVE_INT16:
-                *(int16_t *)numLocation = (int16_t)strtonum(
-                    number_from_xml, INT16_MIN, INT16_MAX, &errstr);
+            case PRIMITIVE_INT64:
+                *(int64_t *)numLocation = (int64_t)strtonum(
+                    number_from_xml, INT64_MIN, INT64_MAX, &errstr);
                 break;
             case PRIMITIVE_INT8:
                 *(int8_t *)numLocation = (int8_t)strtonum(
                     number_from_xml, INT8_MIN, INT8_MAX, &errstr);
                 break;
-            case PRIMITIVE_FLOAT:
-                *(float *)numLocation = strtofnum(number_from_xml, &errstr);
+            case PRIMITIVE_UINT16:
+                *(uint16_t *)numLocation =
+                    (uint16_t)strtounum(number_from_xml, UINT16_MAX, &errstr);
                 break;
-            case PRIMITIVE_DOUBLE:
-                *(double *)numLocation = strtodnum(number_from_xml, &errstr);
+            case PRIMITIVE_UINT32:
+                *(uint32_t *)numLocation =
+                    (uint32_t)strtounum(number_from_xml, UINT32_MAX, &errstr);
+                break;
+            case PRIMITIVE_UINT64:
+                *(uint64_t *)numLocation =
+                    (uint64_t)strtounum(number_from_xml, UINT64_MAX, &errstr);
+                break;
+            case PRIMITIVE_UINT8:
+                *(uint8_t *)numLocation =
+                    (uint8_t)strtounum(number_from_xml, UINT8_MAX, &errstr);

Review comment:
       Minor comment, I found "numLocation" a bit confusing. At first I thought maybe these functions returned a "location" (e.g.. schema file/line number) or something and wasn't sure how that related to a number. I guess this location is referring to this being a pointer so some location in a structure or something? Personally, I think "void *num" would be more clear that this is a pointer to some kind of number.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -27,6 +28,38 @@
 
 // Macros to define unparse_<endian>_<type> functions
 
+#define define_unparse_endian_bool(endian, bits)                               \
+    void unparse_##endian##_bool##bits(bool number, int64_t true_rep,          \
+                                       uint32_t false_rep, UState *ustate)     \
+    {                                                                          \
+        if (!ustate->error_msg)                                                \
+        {                                                                      \
+            union                                                              \
+            {                                                                  \
+                char           c_val[sizeof(uint##bits##_t)];                  \
+                uint##bits##_t i_val;                                          \
+            } buffer;                                                          \
+            if (number)                                                        \
+            {                                                                  \
+                buffer.i_val =                                                 \
+                    (true_rep < 0) ? ~false_rep : (uint32_t)true_rep;          \
+            }                                                                  \
+            else                                                               \
+            {                                                                  \
+                buffer.i_val = false_rep;                                      \
+            }                                                                  \
+            buffer.i_val = hto##endian##bits(buffer.i_val);                    \
+            size_t count =                                                     \
+                fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);       \
+                                                                               \
+            ustate->position += count;                                         \
+            if (count < sizeof(buffer))                                        \
+            {                                                                  \
+                ustate->error_msg = eof_or_error_msg(ustate->stream);          \

Review comment:
       The error messages output in eof_or_error_msg only mention reading, so this might lead to a confusing error message during unparsing. Suggest having unique error messages output in eof_or_error_msg depending on if we are parsing or unparsing.

##########
File path: daffodil-runtime2/src/main/resources/c/libruntime/unparsers.c
##########
@@ -27,6 +28,38 @@
 
 // Macros to define unparse_<endian>_<type> functions
 
+#define define_unparse_endian_bool(endian, bits)                               \
+    void unparse_##endian##_bool##bits(bool number, int64_t true_rep,          \
+                                       uint32_t false_rep, UState *ustate)     \

Review comment:
       Perhaps this is nice since it is consistent with parse, but you really don't need the int64_t in this case. For unparse, we *always* have a true rep. It's either the one provided by the user, or if not provided it's ~false_rep, so this can be calculated at schema compile time, which simplies the logic below abit, since there's no need to check true_rep < 0.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/BinaryIntegerKnownLengthCodeGenerator.scala
##########
@@ -46,23 +46,32 @@ trait BinaryIntegerKnownLengthCodeGenerator {
       case 16 => "0xCCCC"
       case 32 => "0xCCCCCCCC"
       case 64 => "0xCCCCCCCCCCCCCCCC"
-      case _ => e.SDE("Lengths other than 8, 16, 32, or 64 bits are not supported.")
+      case _ => e.SDE("Integer lengths other than 8, 16, 32, or 64 bits are not supported.")
     }
     val fieldName = e.namedQName.local
-    val integer = if (g.signed) s"int${lengthInBits}" else s"uint${lengthInBits}"
     val conv = if (byteOrder eq ByteOrder.BigEndian) "be" else "le"
+    val prim = if (g.signed) s"int${lengthInBits}" else s"uint${lengthInBits}"
     val arraySize = if (e.occursCountKind == OccursCountKind.Fixed) e.maxOccurs else 0
+    val fixed = e.xml.attribute("fixed")
+    val fixedValue = if (fixed.isDefined) fixed.get.text else ""
 
-    def addSimpleTypeStatements(deref: String): Unit = {
+    def addStatements(deref: String): Unit = {
       val initStatement = s"    instance->$fieldName$deref = $initialValue;"
-      val parseStatement = s"    parse_${conv}_$integer(&instance->$fieldName$deref, pstate);"
-      val unparseStatement = s"    unparse_${conv}_$integer(instance->$fieldName$deref, ustate);"
+      val parseStatement = s"    parse_${conv}_$prim(&instance->$fieldName$deref, pstate);"
+      val unparseStatement = s"    unparse_${conv}_$prim(instance->$fieldName$deref, ustate);"
       cgState.addSimpleTypeStatements(initStatement, parseStatement, unparseStatement)
+
+      if (fixedValue.nonEmpty) {
+        val init2 = ""
+        val parse2 = s"""    parse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", pstate);"""
+        val unparse2 = s"""    unparse_validate_fixed(instance->$fieldName$deref == $fixedValue, "$fieldName", ustate);"""
+        cgState.addSimpleTypeStatements(init2, parse2, unparse2)
+      }
     }
     if (arraySize > 0)
       for (i <- 0 until arraySize)
-        addSimpleTypeStatements(s"[$i]")
+        addStatements(s"[$i]")
     else
-      addSimpleTypeStatements("")
+      addStatements("")
   }

Review comment:
       Lots of duplicate code in these addStatements fucntions. Wonering if that can be refactored out?

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -67,8 +66,21 @@ class CodeGeneratorState {
     val C = localName(context)
     val initStatements = structs.top.initStatements.mkString("\n")
     val initChoiceStatements = structs.top.initChoiceStatements.mkString("\n")
-    val parserStatements = structs.top.parserStatements.mkString("\n")
-    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val hasStatements = structs.top.parserStatements.nonEmpty
+    val parserStatements = if (hasStatements)
+      structs.top.parserStatements.mkString("\n")
+    else
+      s"""    // Empty struct, but need to prevent compiler warnings
+         |    (void)${C}_compute_offsets;
+         |    (void)instance;
+         |    (void)pstate;""".stripMargin
+    val unparserStatements = if (hasStatements)

Review comment:
       Should there be a unique hasParserStatemens and hasUnparserStatements? Another option would be to just have
   ```scala
   val parserStatements = structs.top.parserStatements.mkString("\n")
   if (parserStatements == "") {
     s""" //Empty struct, ..."""
   }
   ```

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -262,6 +288,18 @@ class CodeGeneratorState {
       structs.top.parserStatements += parseStatement
       structs.top.unparserStatements += unparseStatement
     }
+
+    // Implement padding if complex type has an explicit length
+    if (context.maybeFixedLengthInBits.isDefined && context.maybeFixedLengthInBits.get > 0) {
+      val octalFillByte = context.fillByteEv.constValue.toByte.toOctalString
+      val parseStatement = s"    parse_fill_bytes(end_position, pstate);"
+      val unparseStatement =
+        s"""    ustate->fill_byte = '\\$octalFillByte';
+           |    unparse_fill_bytes(end_position, ustate);""".stripMargin
+

Review comment:
       Ah, I see how you support different elements having different fill bytes now. Any reason to not just pass octalFillByte as a parameter to ``unparse_fill_bytes`` rather than setting it in the ustate and then having the function reading from the ustate?

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -316,41 +354,55 @@ class CodeGeneratorState {
   }
 
   def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
-    structs.top.initStatements += initStatement
-    structs.top.parserStatements += parseStatement
-    structs.top.unparserStatements += unparseStatement
+    if (initStatement.nonEmpty) structs.top.initStatements += initStatement
+    if (parseStatement.nonEmpty) structs.top.parserStatements += parseStatement
+    if (unparseStatement.nonEmpty) structs.top.unparserStatements += unparseStatement
   }

Review comment:
       Can these if-statements be removed? I would think += would be a no-op if the strings were empty. 

##########
File path: project/Rat.scala
##########
@@ -122,6 +122,13 @@ object Rat {
     file("daffodil-tdml-lib/src/test/resources/test/tdml/test.txt"),
     file("daffodil-tdml-processor/src/test/resources/test/tdml/test.bin"),
     file("daffodil-tdml-processor/src/test/resources/test/tdml/test.txt"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/ISRM_green_to_orange_60000_parse_0.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/ISRM_green_to_orange_60000_parse_1.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/ISRM_orange_to_green_60002_parse.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/MPU_green_to_orange_60004_parse.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/MPU_orange_to_green_60006_parse_0.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/MPU_orange_to_green_60006_parse_1.dat"),
+    file("daffodil-test/src/test/resources/org/apache/daffodil/runtime2/aptina_settings_parse.dat"),

Review comment:
       Just want to confirm that these files are apache licensed? The naming implies that maybe they are coming from some tool or from a website or something and not created from scratch by you? If they are generated by a tool, the license of these files may be considered as having the same license of the tool.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/generators/CodeGeneratorState.scala
##########
@@ -362,21 +414,72 @@ class CodeGeneratorState {
     structs.pop()
   }
 
+  // Gets length from explicit length declaration if any, otherwise from base type's implicit length
+  private def getLengthInBits(e: ElementBase): Long = {
+    e.schemaDefinitionUnless(e.elementLengthInBitsEv.isConstant, "Runtime dfdl:length expressions are not supported.")
+    e.elementLengthInBitsEv.constValue.get
+  }
+
+  // Because schema authors don't always get types right, allows explicit lengths to override implicit lengths

Review comment:
       Is there any chance this might confuse people using the generated code/structs? For example, I can maybe imagine a case where someone wants all the generated structs to only have ``int32_t``'s for some capability reason, so they make all the primitive ``xs:int``. But then individual fields might have lengths that aren't 4-bytes in the data. So they might expect this to essentialy upcast rather than always use the appropriate sized type?
   
   All that to say, what you have seems reasonable, but maybe it's worth outputting a warning?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org