You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "stevedlawrence (via GitHub)" <gi...@apache.org> on 2023/04/07 17:53:30 UTC

[GitHub] [daffodil] stevedlawrence commented on a diff in pull request #998: Add left over data check to generated C parsers

stevedlawrence commented on code in PR #998:
URL: https://github.com/apache/daffodil/pull/998#discussion_r1160850856


##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/unparsers.c:
##########
@@ -465,3 +465,30 @@ unparse_check_bounds(const char *name, size_t count, size_t minOccurs, size_t ma
         ustate->error = &error;
     }
 }
+
+// Flush the fractional byte if not done yet
+
+void
+flush_fractional_byte(UState *ustate)
+{
+    // Skip the flush if we already have an error
+    if (!ustate->error)
+    {
+        // Do we have any unwritten bits left in the fractional byte?
+        if (ustate->numUnwritBits)
+        {
+            // Fill the fractional byte
+            size_t num_bits_fill = BYTE_WIDTH - ustate->numUnwritBits;
+            ustate->unwritBits <<= num_bits_fill;

Review Comment:
   This means the byte to write is padded with zero's instead of using the `dfdl:fillByte` property. Though, this is probably fine for runtime2. In fact, I think runtime1 might even have this same bug, so at least we are consistent :wink: Just something to be aware about that I *think* this is technically the wrong behavior.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/CodeGeneratorState.scala:
##########
@@ -354,19 +365,20 @@ class CodeGeneratorState(private val root: ElementBase) {
          |// Initialize, parse, and unparse nodes of the infoset
          |
          |$finalImplementation
-         |// Return a root element for parsing or unparsing the infoset
+         |// Get an infoset (optionally clearing it first) for parsing/walking
          |
          |InfosetBase *
-         |rootElement(void)
+         |get_infoset(bool clear_infoset)
          |{
-         |    static bool initialized;
-         |    static $rootName root;
-         |    if (!initialized)
+         |    static $rootName infoset;
+         |
+         |    if (clear_infoset)
          |    {
-         |        ${rootName}_initERD(&root, (InfosetBase *)&root);
-         |        initialized = true;
+         |        memset(&infoset, 0, sizeof(infoset));

Review Comment:
   Is it necessary to walk through the infoset and free any pointers that point to heap allocated memory to avoid memory leaks?
   
   It looks like maybe that's only hexBinary? That's the only place I see malloc. Are only bounded arrays supported by runtime1 an so don't need heap allocations?



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/DaffodilCExamplesGenerator.scala:
##########
@@ -81,19 +81,23 @@ object DaffodilCExamplesGenerator {
     val nestedRootName = Some("NestedUnion")
     val padTestSchema = schemaDir / "padtest.dfdl.xsd"
     val padTestRootName = None
+    val simpleSchema = schemaDir / "simple.dfdl.xsd"
+    val simpleRootName = Some("simple-byte")
     val variableLenSchema = schemaDir / "variablelen.dfdl.xsd"
     val variableLenRootName = Some("expressionElement")
 
     val examplesDir = os.Path(args(0))
     val exNumsExampleDir = examplesDir / "ex_nums"
     val nestedExampleDir = examplesDir / "NestedUnion"
     val padTestExampleDir = examplesDir / "padtest"
+    val simpleExampleDir = examplesDir / "simple"
     val variableLenExampleDir = examplesDir / "variablelen"

Review Comment:
   Doesn't have to be done in this PR, but at some point we may want to turn this into an array of tuples and we can just iterate over them, maybe something like:
   ```scala
   val examples = Array(
     ("ex_nums.dfdl.xsd", None, "ex_nums"),
     ("nested.dfdl.xsd", Some("NestedUnion"), "NestedUnion")
     ...,
   )
   examples.foreach { case (schema, rootOpt, dir) =>
     updateCExample(schemaDir / schema, rootOpt, examplesDir / dir)
   }
   ```
   Then adding new examples becomes just adding a new entry in the array and things aren't spread out among different variables.
   
   We're starting to get enough examples where this could clean things up a bit.



##########
daffodil-codegen-c/src/main/scala/org/apache/daffodil/codegen/c/generators/CodeGeneratorState.scala:
##########
@@ -727,22 +778,32 @@ class CodeGeneratorState(private val root: ElementBase) {
 
   // Recursively builds a hopefully unique name using the given StringBuilder
   private def buildName(sc: SchemaComponent, sb: StringBuilder): StringBuilder = {
+    // Append schema component's name
     sc match {
       case eb: ElementBase => sb ++= eb.namedQName.local += '_'
       case gd: GlobalElementDecl => sb ++= gd.namedQName.local += '_'
       case ct: GlobalComplexTypeDef => sb ++= ct.namedQName.local += '_'
       case _ => // don't include other schema components in qualified name
     }
+    // Recursively append parent schema components' names
     sc.optLexicalParent.foreach {
       buildName(_, sb)
     }
+    // Ensure name contains only legal characters for C identifiers
+    lazy val legalChars: immutable.Set[Char] =
+      Set('_') ++ ('a' to 'z') ++ ('A' to 'Z') ++ ('0' to '9')
+    for (i <- sb.indices) {
+      if (!legalChars.contains(sb.charAt(i))) {
+        sb.setCharAt(i, '_')
+      }
+    }

Review Comment:
   This logic about names also exists the `cName` function. Can those be refactored into a common function so there's only one place that deals with creating legal names?



##########
daffodil-codegen-c/src/main/resources/org/apache/daffodil/codegen/c/files/libruntime/infoset.c:
##########
@@ -123,64 +124,55 @@ get_erd_ns(const ERD *erd)
     return erd->namedQName.ns;
 }
 
-// walkInfoset - walk each node of an infoset and call
-// VisitEventHandler methods
+// parse_data - parse an input stream into an infoset, check for
+// leftover data, and return any errors in pstate
 
-const Error *
-walkInfoset(const VisitEventHandler *handler, const InfosetBase *infoNode)
+void
+parse_data(InfosetBase *infoset, PState *pstate)
 {
-    const Error *error = handler->visitStartDocument(handler);
+    infoset->erd->parseSelf(infoset, pstate);
+    no_leftover_data(pstate);
+}
 
-    if (!error)
-    {
-        error = walkInfosetNode(handler, infoNode);
-    }
-    if (!error)
-    {
-        error = handler->visitEndDocument(handler);
-    }
+// unparse_infoset - unparse an infoset to an output stream, flush the
+// fractional byte if not done yet, and return any errors in ustate
 
-    return error;
+void
+unparse_infoset(InfosetBase *infoset, UState *ustate)
+{
+    infoset->erd->unparseSelf(infoset, ustate);
+    flush_fractional_byte(ustate);

Review Comment:
   The term we use for this in runtime1 is "fragment byte". I'm not sure how important it is that we are consistent in the different runtimes, so up to you if you want to change it or not.



##########
daffodil-cli/src/main/scala/org/apache/daffodil/cli/Main.scala:
##########
@@ -1618,7 +1618,7 @@ class Main(
                         fail += 1
                         if (testOpts.info() > 0) {
                           STDOUT.println("  Failure Information:")
-                          STDOUT.println(indent(e.getMessage(), 4))
+                          STDOUT.println(indent(e.toString, 4))

Review Comment:
   This seems fine, but your commit mentions the TDML Runner is throwing a NPE? That's probably a bug, any idea what caused it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org