You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2020/09/28 21:11:53 UTC

[GitHub] [incubator-daffodil] tuxji opened a new pull request #422: WIP: Add runtime2 backend with C code generator

tuxji opened a new pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422


   This pull request adds a new runtime2 backend to Daffodil.  This pull
   request is only a baby step which can handle only 32-bit big-endian
   binary integers, but it implements enough functionality to serve as a
   start for discussion and collaboration.  This pull request's
   integration branch is a work in progress and will be rebased over time
   to keep up with the master branch so you will have to run the
   following commands every time you want to pull new changes into your
   checkout of the integration branch:
   
       git pull --rebase
       git diff ORIG_HEAD..HEAD
   
   Please use TestRuntime2.dfdl.xsd and TestRuntime2.tdml as a guide of
   how to write a small DFDL schema containing some xs:int elements and
   run daffodil on a corresponding TDML file to parse and unparse some
   binary test data using the runtime2 backend:
   
   ```bash
   $ daffodil test daffodil-runtime2/src/test/resources/org/apache/daffodil/runtime2/TestRuntime2.tdml
   Creating DFDL Test Suite for daffodil-runtime2/src/test/resources/org/apache/daffodil/runtime2/TestRuntime2.tdml
   [Pass] parse_int32
   [Pass] unparse_int32
   
   Total: 2, Pass: 2, Fail: 0, Not Found: 0
   ```
   
   The runtime2 backend will generate C code from your DFDL schema in a
   temporary directory (/tmp/NNNNNNNNNNNN with NNN... all numbers), build
   a C-based executable, run the TDML tests through the executable, and
   check the results.  You will be able to take the C code and use it as
   a parsing/unparsing library in an embedded device with limited memory
   and power.  Our goal is to implement the smallest possible conforming
   subset of DFDL as described in the "Runtime 2 Design" table of
   <https://cwiki.apache.org/confluence/display/DAFFODIL/WIP%3A+Daffodil+Runtime+2>.
   
   Note that you will need a C11 or C18 compiler ("cc") and a tiny XML
   library called Mini-XML ("mxml.h" and "libmxml.a") to build Daffodil
   on this branch and run the TDML tests.  Many systems have both of
   these available as installable binary packages with names like "gcc",
   "libmxml-dev", etc.  If there is enough request, we can bundle the
   Mini-XML library sources with Daffodil to make it easier to build the
   runtime2 backend on systems that don't have an installable
   "libmxml-dev" package.  We also have considered making runtime2 (and a
   possible future runtime3 targeting programmable hardware logic)
   optional parts that can be distributed separately and plugged into
   Daffodil on demand (for modularity, not for legal purposes; we want
   all Daffodil code to be covered by the Apache 2 license).
   
   DAFFODIL-2202
   ______________________________________________________________________
   
   Questions & loose ends:
   
   1. I just found out that the os-lib library author has stopped
   publishing Scala 2.11 builds since March 2019.  I hadn't known that
   until I enabled Daffodil's GitHub Actions CI workflow in my fork and
   saw the compilation problems.  I've opened a GitHub issue asking the
   os-lib author if he would consider publishing Scala 2.11 builds again
   for a period of time until Daffodil stops supporting Scala 2.11, but
   if he says no, should I replace all my calls of the os-lib library or
   make Daffodil start supporting Scala 2.13 and stop supporting 2.11
   like the os-lib author has done?
   
   2. Search for "TODO" in the changelog below to find some more loose
   ends and questions:
   
      a. In daffodil-core's SequenceCombinator.scala, generateCode needs
   to support generating code for each child in a sequence with multiple
   children.
   
      b. In daffodil-core's
   runtime2/generators/BinaryNumberParserGenerator.scala, generateCode
   needs to generate code that parses and unparses binary numbers as
   securely as possible.  We may need to develop some C functions that
   are more secure than what the current code uses.
   
      c. In daffodil-core's runtime2/generators/ParserGenerator.scala,
   defineQNameInit doesn't handle multiple xmls=ns declarations yet.
   
      d. Do we need a "runtime" tunable as well as a "tdmlImplementation"
   tunable within Daffodil?  The "tdmlImplementation" tunable allows TDML
   tests to use runtime2 instead of runtime1.  The "runtime" tunable
   could allow "daffodil parse ..." and "daffodil unparse ..." commands
   to use runtime2 instead of runtime1 too, but that seems redundant
   because the runtime2 backend dynamically generates and builds an
   executable from C code and calls the C-based executable with similar
   "daffodil parse ..." and "daffodil unparse ..." command lines as well.
   I think it makes more sense to add a new "daffodil generate ..."
   command which will generate the C code and executable in your own
   directory so you can do whatever you want with the C code or
   executable afterwards.
   
      e. Should we define a pluggable CodeGeneratorState interface for
   runtime2 to implement in daffodil-runtime1's DFDLParserUnparser.scala?
   We may want to wait for a future runtime3 in order to make the common
   interface clearer.
   
      f. In daffodil-runtime2's stack.{c,h}, we may need to use
   heap-allocated storage rather than statically allocated storage even
   though we use the stack only to run TDML tests.
   
   3. The generated_code.{c.h} files checked in here were originally
   mockups made in Emacs & Visual Studio Code for design and debugging
   purposes.  Depending on whether your DFDL schema is the same or
   different, runtime2 will generate a pair of generated_code.{c.h} files
   in a temporary directory with the same or different contents than
   these mockup files.  We are checking in these files only to make it
   easier to continue changing and debugging the mockup code in Visual
   Code Studio.
   
   ______________________________________________________________________
   
   ChangeLog:
   
   In .github/workflows/main.yml, install MSYS2 environment to give us a
   C compiler on Windows.  Install mxml library on both Linux and Windows
   so we can build/link our C code with it.
   
   In build.sbt, build daffodil-runtime2 module with CcPlugin configured
   to compile the runtime2 C source files into a "libruntime2.a" static
   library.
   
   In daffodil-cli/bin.NOTICE, fix an attribution notice.
   
   In daffodil-cli/build.sbt, configure the Universal plugin to include
   the runtime2 C header files and "libruntime2.a" library in an
   installed daffodil so that an installed daffodil can use the runtime2
   backend.  See also the code in GeneratedCodeCompiler which looks for
   the runtime2 C header and "libruntime2.a" library in either an
   installed daffodil location or the daffodil source tree depending on
   where and how the code is executed.  We are using a "simplest design
   that can work" approach until future requirements become clearer.
   
   In daffodil-core's Compiler.scala, add a ProcessorFactory
   generateCode method to call generateCode on a root document and
   return a fully populated CodeGeneratorState object containing
   generated C code.
   
   In daffodil-core's ElementDeclGrammarMixin.scala, add a
   RootGrammarMixin generateCode method to call generateCode on its
   document element.
   
   In daffodil-core's Grammar.scala, add a SeqComp generateCode method to
   call generateCode on its children.
   
   In daffodil-core's GrammarTerm.scala, give abstract class Gram a
   GramRuntime2Mixin trait as well as a GramRuntime1Mixin trait.
   
   In daffodil-core's Production.scala, add a Prod generateCode method to
   call generateCode on its gram object.
   
   In daffodil-core's ElementCombinator.scala, add an ElementCombinator
   generateCode method to call generateCode on its subComb object, add
   empty CaptureContentLengthStart, CaptureContentLengthEnd,
   CaptureValueLengthStart, and CaptureValueLengthEnd generateCode
   methods, and add an ElementParseAndUnspecifiedLength generateCode
   method to instantiate and call generateCode on an
   ElementParserGenerator.
   
   In daffodil-core's PrimitivesBinaryNumber.scala, add a
   BinaryIntegerKnownLength generateCode method to call generateCode on
   its generator object.
   
   In daffodil-core's SequenceChild.scala, add a
   ScalarOrderedSequenceChild generateCode method to call generateCode on
   its term's termContentBody object.
   
   In daffodil-core's SequenceCombinator.scala, add an OrderedSequence
   generateCode method to call generateCode on a single child of a
   sequence (TODO: need to support generating code for each child in a
   sequence with multiple children).
   
   In daffodil-core's SpecifiedLength.scala, add a
   SpecifiedLengthImplicit generateCode method to call generateCode on
   its eGram object.
   
   In daffodil-core's runtime2/GeneratedCodeCompiler.scala, implement a
   GeneratedCodeCompiler compile method to find the runtime2 C header
   files and "libruntime2.a" library, write the generated C code to a
   temporary directory, run the C compiler, capture any compilation
   diagnostics, and add them to its ProcessorFactory's diagnostics.  Also
   implement a GeneratedCodeCompiler dataProcessor method to return a
   Runtime2DataProcessor object with the path of the executable that was
   just compiled.
   
   In daffodil-core's runtime2/GramRuntime2Mixin.scala, implement a
   GramRuntime2Mixin trait with a generateCode method which throws a SDE
   if a subclass doesn't implement the generateCode method.
   
   In daffodil-core's runtime2/Runtime2DataProcessor.scala, implement a
   Runtime2DataProcessor class which extends/implements
   DFDL.DataProcessorBase while adding its own new parse and unparse
   methods.  The parse method writes the input file to a temporary
   directory, runs the executable in that directory telling to parse the
   input file and write an output file, creates a ParseResult object with
   the path of the output file, adds any runtime errors to the
   ParseResult's diagnostics, and returns the ParseResult object.  The
   unparse method also writes the input file to a temporary directory,
   runs the executable in that directory telling it to unparse the input
   file and write an output file, creates an UnparseResult object with
   the path of the output file, adds any runtime errors to the
   UnparseResult's diagnostics, and returns the UnparseResult object.
   Implement a Runtime2DataLocation object with all zero fields since we
   can't track the executable's read position anyway.  Implement both
   ParseResult and UnparseResult classes using that Runtime2DataLocation
   object.  Make ParseResult load the output file and return its XML
   data.  Make UnparseResult save and return the unparse output file's
   length as its finalBitPos0b field for roundtrip processing by TDML
   tests.
   
   In daffodil-core's
   runtime2/generators/BinaryNumberParserGenerator.scala, implement a
   BinaryIntegerKnownLengthParserGenerator class with a generateCode
   method that generates the C code needed to initialize, parse, and
   unparse 32-bit integer fields.  Initialize the field to the memory bit
   pattern 0xCDCDCDCD since I'd already had to fix a bug that was leaving
   fields uninitialized; this distinctive bit pattern should make such
   bugs more obvious.  TODO: Make the generated C code as secure as
   possible using Language-Theoretic Security functions if possible.
   
   In daffodil-core's runtime2/generators/ElementParserGenerator.scala,
   implement a ElementParserGenerator class with a generateCode method
   that makes the CodeGeneratorState calls needed for both complex and
   simple elements.
   
   In daffodil-core's runtime2/generators/ParserGenerator.scala,
   implement a ParseGenerator trait with a generateCode method and
   implement a CodeGeneratorState class with many methods to generate and
   accumulate strings of generated C code.  Implement a ComplexCGState
   class to accumulate strings of generated C code for nested elements
   inside complex elements.  TODO: In defineQNameInit, we try to optimize
   away a single extraneous xmlns=ns declaration in a child element when
   its parent has the same xmlns=ns declaration, but our approach doesn't
   handle multiple xmls=ns declarations and has not been tested on corner
   cases yet.
   
   In daffodil-core's runtime2/TestGeneratedCodeCompiler.scala, write
   methods to test GeneratedCodeCompiler's compile method and
   Runtime2DataProcessor's parse and unparse methods.
   
   In daffodil-core's tdml.xsd, add "daffodil-runtime2" as a new TDML
   implementation enumeration as well as "daffodil" and "ibm".
   
   In daffodil-propgen's dafext.xsd, add "tdmlImplementation" as a new
   tunable with default value "daffodil".  See also the code in
   daffodil-tdml-lib's TDMLRunner.scala which instantiates three
   different TDMLDFDLProcessorFactory implementations depending on the
   tdmlImplementation tunable's value ("daffodil", "daffodil-runtime2",
   or "ibm").  Also add "runtime" as a new tunable with default value
   "runtime1" and allowed value "runtime2".  TODO: Need to define usage
   for this "runtime" tunable and implement its usage in the rest of
   daffodil.  Does it make sense to use a tunable set to "runtime2"
   instead of "runtime1" when running "daffodil parse ..." or "daffodil
   unparse ..." from the command line?  Dynamically generating, building,
   and running a C-based executable in runtime2 may not speed up these
   commands very much compared to runtime1's Scala code.  Perhaps we
   should add a new "daffodil generate ..."  subcommand which will
   generate C code from a given DFDL schema so you can use that C code to
   build your own application.
   
   In daffodil-runtime1's DFDLParserUnparser.scala, split the original
   DataProcessor trait into a DataProcessorBase trait without the
   WithDiagnostics trait or parse/unparse methods and a DataProcessor
   trait extending DataProcessorBase and adding the WithDiagnostics trait
   along with parse/unparse methods.  The reason is to allow
   Runtime2DataProcessor to extend DataProcessorBase and add its own
   parse/unparse methods with different parameters and return types
   without having to implement WithDiagnostics.  Also add a
   CodeGeneratorState trait with no methods which will be extended by
   runtime2's CodeGeneratorState class in case we need to modularize
   runtime2 for pluggability.  TODO: Should we make runtime2 pluggable?
   
   In daffodil-runtime2's .clang-format, define the C coding style to be
   used when formatting the runtime2 C files.  We are using the Barr
   Group's Embedded C style recommendations:
   
      - braces on their own lines, BSD/Allman style
      - indent 4 spaces (no tab characters)
      - align decl names on first char
      - put function definition names in first column
   
   Note we also run include-what-you-use (iwyu) on the runtime2 C files
   to make sure each file has all the #includes for everything it uses
   while removing any extraneous #includes.
   
   In daffodil-runtime2's .vscode/launch.json and tasks.json, tell Visual
   Studio Code how to compile and debug the runtime2 C files (used only
   to make development/editing of these files easier).
   
   In daffodil-runtime2's common_runtime.{c,h}, implement a walkInfoset
   method to walk a runtime2 infoset while calling VisitEventHandler
   methods, and define runtime2 common types and structs such as
   NamedQName, TypeCode, ElementRuntimeData, InfosetBase, PState, UState,
   and VisitEventHandler.
   
   In daffodil-runtime2's daffodil_argp.{c,h}, implement all the code
   needed to support the runtime2 executable's "daffodil parse" and
   "daffodil unparse" command line interface arguments (following
   daffodil's Scala CLI syntax as closely as possible).
   
   In daffodil-runtime2's daffodil_main.c, implement the runtime2
   executable's main method which doesn't need to know anything about the
   generated C code except how to initialize it by calling a
   rootInfoset() method.  The only C files which need to be generated by
   runtime2 are the two files "generated_code.h" and "generated_code.c".
   
   In daffodil-runtime2's generated_code.{c,h}, please note the files
   checked in here were originally mockups made in Visual Studio Code for
   design and debugging purposes.  Depending on whether your DFDL schema
   is the same or different, runtime2 will generate a pair of
   generated_code.{c.h} files in a temporary directory with the same or
   different contents than these mockup files.  We are checking in these
   files only to make it easier to continue changing and debugging the
   generated code in Visual Code Studio.
   
   In daffodil-runtime2's stack.{c,h}, implement a stack used by
   xml_writer.c to build an XML document while traversing the in-memory
   infoset.  Use statically allocated storage, not heap allocated
   storage, as an exercise in case we might need to use stack.c in
   another part of the runtime2 C code on an embedded device with limited
   memory.  However, the Mini-XML library requires heap allocated storage
   anyway so switch stack.c to heap allocated storage later (TODO) if it
   turns out we use mxml and stack only for TDML tests.
   
   In daffodil-runtime2's xml_reader.{c,h}, implement VisitEventHandler
   methods to walk an runtime2 infoset and use XML data from an input
   file to initialize the in-memory infoset.
   
   In daffodil-runtime2's xml_writer.{c,h}, implement VisitEventHandler
   methods to walk an runtime2 infoset, push nested XML nodes on a stack,
   and write the complete XML data to an output file when the walk is
   complete.  The stack has a statically defined maximum depth of 100
   nested nodes right now which probably will be changed later.
   
   In daffodil-runtime2's TestRuntime2.dfdl.xsd, define an example DFDL
   schema to be used for tests.  Right now the schema has only one
   top-level complex element containing three xs:int simple elements to
   be parsed and unparsed.
   
   In daffodil-runtime2's TestRuntime2.tdml, define a suite of TDML tests
   with both config-runtime1 and config-runtime2 configurations and a
   defaultConfig that selects one of them so you can run the TDML tests
   with either runtime1 or runtime2.  Right now we have only two
   "parse_int32" and "unparse_int32" tests with corresponding
   "parse_int32" and "unparse_int32" files to be parsed/unparsed.
   
   In daffodil-runtime2's TestRuntime2.scala, define a TDML runner to run
   the runtime2 suite of TDML tests from the "sbt test" command line.
   
   In daffodil-tdml-lib's TDMLRunner.scala, extend the TestCase
   tdmlDFDLProcessorFactory method to allow three different
   TDMLDFDLProcessorFactory implementations to be used depending on the
   corresponding value of the tdmlImplementation tunable ("daffodil",
   "daffodil-runtime2", or "ibm").  Fix a typo in UnparseTestCase's
   roundtrip error message.
   
   In daffodil-tdml-processor's Runtime2TDMLDFDLProcessor.scala,
   implement a TDMLDFDLProcessorFactory class with implementationName
   "daffodil-runtime2" and a getProcessor method which runs
   GeneratedCodeCompiler's compile method and returns a
   Runtime2TDMLDFDLProcessor ready to run the executable.  Implement a
   Runtime2TDMLDFDLProcessor class with parse and unparse methods which
   run Runtime2DataProcessor's parse and unparse methods and return
   Runtime2TDMLParseResult and Runtime2TDMLUnparseResult objects.
   Implement Runtime2TDMLParseResult and Runtime2TDMLUnparseResult as
   wrapper classes around runtime2's ParseResult and UnparseResult
   classes.
   
   In projects/Dependencies.scala, add a com.lihaoyi %% os-lib dependency
   to let GeneratedCodeCompiler and Runtime2DataProcessor create the
   files they need to write or read and call the os commands they need to
   compile the C code and run the executable.
   
   In project/Rat.scala, fix a typo.
   
   In project/plugins.sbt, make sbt use com.github.tnakamot % sbt-cc as
   one of its plugins.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r526947840



##########
File path: .gitignore
##########
@@ -52,23 +43,35 @@ daffodil-extra/*
 .idea_modules
 *.iml
 #
-# FOR EMACS ENSIME
+# For Emacs & ensime
 #
-.ensime_cache
-# these are autosave emacs files
-\#*#
+*~
 .\#*
 .\#.*
+.ensime_cache
+\#*#
 #
-# FOR VIM
+# For vim
 #
 .*.swp
 #
 # For Eclipse
 #
 .classpath
 .project
-bin/    
+.settings
 /eclipse-projects
-.settings/
-
+bin
+#
+# For C, Scala Metal, Visual Studio Code & zig cc
+#
+*.a
+*.o
+.bloop
+.metals
+.vscode
+GPATH
+GRTAGS
+GTAGS
+daffodil-runtime2/src/main/resources/daffodil
+zig-cache

Review comment:
       This .gitignore is an old version.  I know the main branch has removed most IDE-specific files from this .gitignore in a more recent pull request and I will keep this .gitignore as clean as possible when I rebase runtime-2202 to keep up with changes to the main branch.

##########
File path: build.sbt
##########
@@ -68,9 +100,9 @@ lazy val tdmlProc         = Project("daffodil-tdml-processor", file("daffodil-td
                               .settings(commonSettings)
 
 lazy val cli              = Project("daffodil-cli", file("daffodil-cli")).configs(IntegrationTest)
-                              .dependsOn(tdmlProc, sapi, japi, udf % "it->test") // causes sapi/japi to be pulled in to the helper zip/tar
+                              .dependsOn(tdmlProc, runtime2, sapi, japi, udf % "it->test") // causes runtime2/sapi/japi to be pulled in to the helper zip/tar

Review comment:
       Even though we're using Class.forName to avoid any compile time dependency from daffodil core or cli classes on runtime2 classes, we still need to ensure that the runtime2 classes are included in the daffodil cli's classpath.

##########
File path: build.sbt
##########
@@ -43,6 +47,34 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2CFiles        = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)
+                              .enablePlugins(CcPlugin)
+                              .dependsOn(core, core % "test->test", tdmlProc)
+                              .settings(commonSettings)
+                              .settings(publishArtifact in (Compile, packageDoc) := false)
+                              .settings(
+                                Compile / ccTargets := ListSet(runtime2CFiles),
+                                Compile / cSources  := Map(
+                                  runtime2CFiles -> (
+                                    ((Compile / resourceDirectory).value / "c" * GlobFilter("*.c")).get() ++
+                                    ((Compile / resourceDirectory).value / "examples" * GlobFilter("*.c")).get()
+                                  )
+                                ),
+                                Compile / cIncludeDirectories := Map(
+                                  runtime2CFiles -> Seq(
+                                    (Compile / resourceDirectory).value / "c",
+                                    (Compile / resourceDirectory).value / "examples"
+                                  )
+                                ),
+                                Compile / cFlags := (Compile / cFlags).value.withDefaultValue(Seq(
+                                  "-g",
+                                  "-Wall",
+                                  "-Wextra",
+                                  "-Wno-missing-field-initializers",
+                                ))
+                              )
+

Review comment:
       Strictly speaking, we don't need to compile the C source files in "sbt compile" because nothing uses their object files (our runtime2 code always builds an executable directly from C source files at runtime).  The goal of these sbt-cc settings is to warn developers as quickly as possible if they change something in the C source files that doesn't compile correctly, otherwise developers may not realize it until they run the runtime2 unit tests.  I'm willing to take out these sbt-cc settings if some developers don't want to have any C compiler installed on their systems.  Then only developers working with the C source files will need to install a C compiler and they can get early warnings by using an IDE with a C plugin like Visual Studio Code to edit the C source files.

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -1337,11 +1398,42 @@ object Main extends Logging {
         0
       }
 
-      case _ => {
-        // This should never happen, this is caught by validation
-        Assert.impossible()
-        // 1
+      case Some(conf.generate) => {
+        conf.subcommands match {
+          case List(conf.generate, conf.generate.c) => {
+            val generateOpts = conf.generate.c
+
+            // Read any config file and any tunables given as arguments
+            val cfgFileNode = generateOpts.config.toOption match {
+              case None => None
+              case Some(pathToConfig) => Some(this.loadConfigurationFile(pathToConfig))
+            }
+            val tunables = retrieveTunables(generateOpts.tunables, cfgFileNode)
+
+            // Create a CodeGenerator from the DFDL schema
+            val generator = createGeneratorFromSchema(generateOpts.schema(), generateOpts.rootNS.toOption,
+              tunables, generateOpts.language)
+
+            // Ask the CodeGenerator to generate source code from the DFDL schema
+            val rootNS = generateOpts.rootNS.toOption
+            val outputDir = generateOpts.outputDir.toOption.getOrElse(".")
+            val rc = generator match {
+              case Some(generator) => {
+                Timer.getResult("generating", generator.generateCode(rootNS, outputDir))
+                displayDiagnostics(generator)
+                if (generator.isError) 1 else 0
+              }
+              case None => 1
+            }
+            rc
+          }
+          // Required to avoid "match may not be exhaustive", but should never happen
+          case _ => Assert.impossible()
+        }

Review comment:
       We can generate code for more languages by adding more cases here.  The code inside the case probably would need no change except for the first two lines, so if we add another case we should extract the code in our "c" case into a function that can be called from another language-specific case.
   
   ```
             case List(conf.generate, conf.generate.c) => {
               val generateOpts = conf.generate.c
   ```
   

##########
File path: daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd
##########
@@ -388,6 +396,14 @@
             </xs:documentation>
           </xs:annotation>
         </xs:element>
+        <xs:element name="tdmlImplementation" type="xs:string" default="daffodil" minOccurs="0">
+          <xs:annotation>
+            <xs:documentation>
+              TDMLDFDLProcessorFactory implementation to use when running TDML tests.
+              Allowed values are "daffodil" (default), "daffodil-runtime2", and "ibm".
+            </xs:documentation>
+          </xs:annotation>
+        </xs:element>
         <xs:element name="unqualifiedPathStepPolicy" type="daf:TunableUnqualifiedPathStepPolicy" default="noNamespace" minOccurs="0">

Review comment:
       TDML tests can set this tunable to tell Daffodil which TDML processor class should be used to run them (they already have to say which TDML implementations they can work with if they want to be compatible with multiple processors).  Otherwise, we would have to use a different CLI executable to run TDML tests with runtime2's TDML processor class, or else modify `daffodil test <tdml>` to accept an option similar to this tunable.

##########
File path: daffodil-runtime2/src/main/resources/examples/ex_int32.c
##########
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "ex_int32.h" // for generated code structs
+#include <endian.h>   // for be32toh, htobe32
+#include <stddef.h>   // for ptrdiff_t
+#include <stdio.h>    // for NULL, fread, fwrite, size_t, FILE
+
+// Prototypes needed for compilation
+
+static void        c2_initSelf(c2 *instance);
+static const char *c2_parseSelf(c2 *instance, const PState *pstate);
+static const char *c2_unparseSelf(const c2 *instance, const UState *ustate);
+static void        c1_initSelf(c1 *instance);
+static const char *c1_parseSelf(c1 *instance, const PState *pstate);
+static const char *c1_unparseSelf(const c1 *instance, const UState *ustate);
+
+// Metadata singletons
+
+static const ERD e1_ERD = {
+    {
+        "ex", // namedQName.prefix
+        "e1", // namedQName.local
+        NULL, // namedQName.ns
+    },
+    PRIMITIVE_INT32, // typeCode
+    0,               // numChildren
+    NULL,            // offsets
+    NULL,            // childrenERDs
+    NULL,            // initSelf
+    NULL,            // parseSelf
+    NULL,            // unparseSelf
+};
+
+static const ERD e2_ERD = {
+    {
+        "ex", // namedQName.prefix
+        "e2", // namedQName.local
+        NULL, // namedQName.ns
+    },
+    PRIMITIVE_INT32, // typeCode
+    0,               // numChildren
+    NULL,            // offsets
+    NULL,            // childrenERDs
+    NULL,            // initSelf
+    NULL,            // parseSelf
+    NULL,            // unparseSelf
+};
+
+static const ERD e3_ERD = {
+    {
+        "ex", // namedQName.prefix
+        "e3", // namedQName.local
+        NULL, // namedQName.ns
+    },
+    PRIMITIVE_INT32, // typeCode
+    0,               // numChildren
+    NULL,            // offsets
+    NULL,            // childrenERDs
+    NULL,            // initSelf
+    NULL,            // parseSelf
+    NULL,            // unparseSelf
+};
+
+static const c2 c2_compute_ERD_offsets;
+
+static const ptrdiff_t c2_offsets[2] = {
+    (char *)&c2_compute_ERD_offsets.e2 - (char *)&c2_compute_ERD_offsets,
+    (char *)&c2_compute_ERD_offsets.e3 - (char *)&c2_compute_ERD_offsets};
+
+static const ERD *c2_childrenERDs[2] = {&e2_ERD, &e3_ERD};
+
+static const ERD c2_ERD = {
+    {
+        "ex", // namedQName.prefix
+        "c2", // namedQName.local
+        NULL, // namedQName.ns
+    },
+    COMPLEX,                         // typeCode
+    2,                               // numChildren
+    c2_offsets,                      // offsets
+    c2_childrenERDs,                 // childrenERDs
+    (ERDInitSelf)&c2_initSelf,       // initSelf
+    (ERDParseSelf)&c2_parseSelf,     // parseSelf
+    (ERDUnparseSelf)&c2_unparseSelf, // unparseSelf
+};
+
+static const c1 c1_compute_ERD_offsets;
+
+static const ptrdiff_t c1_offsets[2] = {
+    (char *)&c1_compute_ERD_offsets.e1 - (char *)&c1_compute_ERD_offsets,
+    (char *)&c1_compute_ERD_offsets.c2 - (char *)&c1_compute_ERD_offsets};
+
+static const ERD *c1_childrenERDs[2] = {&e1_ERD, &c2_ERD};
+
+static const ERD c1_ERD = {
+    {
+        "ex",                 // namedQName.prefix
+        "c1",                 // namedQName.local
+        "http://example.com", // namedQName.ns
+    },
+    COMPLEX,                         // typeCode
+    2,                               // numChildren
+    c1_offsets,                      // offsets
+    c1_childrenERDs,                 // childrenERDs
+    (ERDInitSelf)&c1_initSelf,       // initSelf
+    (ERDParseSelf)&c1_parseSelf,     // parseSelf
+    (ERDUnparseSelf)&c1_unparseSelf, // unparseSelf
+};
+
+// Return a root element to be used for parsing or unparsing
+
+InfosetBase *
+rootElement()
+{
+    static c1    instance;
+    InfosetBase *root = &instance._base;
+    c1_ERD.initSelf(root);
+    return root;
+}
+
+// Methods to initialize, parse, and unparse infoset nodes
+
+static void
+c2_initSelf(c2 *instance)
+{
+    instance->e2 = 0xCDCDCDCD;
+    instance->e3 = 0xCDCDCDCD;
+    instance->_base.erd = &c2_ERD;
+}
+
+static const char *
+c2_parseSelf(c2 *instance, const PState *pstate)
+{
+    const char *error_msg = NULL;
+    if (!error_msg)
+    {
+        char   buffer[4];
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+        if (count < sizeof(buffer))
+        {
+            error_msg = eof_or_error_msg(pstate->stream);
+        }
+        instance->e2 = be32toh(*((uint32_t *)(&buffer)));
+    }
+    if (!error_msg)
+    {
+        char   buffer[4];
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+        if (count < sizeof(buffer))
+        {
+            error_msg = eof_or_error_msg(pstate->stream);
+        }
+        instance->e3 = be32toh(*((uint32_t *)(&buffer)));
+    }
+    return error_msg;
+}
+
+static const char *
+c2_unparseSelf(const c2 *instance, const UState *ustate)
+{
+    const char *error_msg = NULL;
+    if (!error_msg)
+    {
+        union
+        {
+            char     c_val[4];
+            uint32_t i_val;
+        } buffer;
+        buffer.i_val = htobe32(instance->e2);
+        size_t count = fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);
+        if (count < sizeof(buffer))
+        {
+            error_msg = eof_or_error_msg(ustate->stream);
+        }
+    }
+    if (!error_msg)
+    {
+        union
+        {
+            char     c_val[4];
+            uint32_t i_val;
+        } buffer;
+        buffer.i_val = htobe32(instance->e3);
+        size_t count = fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);
+        if (count < sizeof(buffer))
+        {
+            error_msg = eof_or_error_msg(ustate->stream);
+        }
+    }
+    return error_msg;
+}
+
+static void
+c1_initSelf(c1 *instance)
+{
+    instance->e1 = 0xCDCDCDCD;
+    c2_initSelf(&instance->c2);
+    instance->_base.erd = &c1_ERD;
+}
+
+static const char *
+c1_parseSelf(c1 *instance, const PState *pstate)
+{
+    const char *error_msg = NULL;
+    if (!error_msg)
+    {
+        char   buffer[4];
+        size_t count = fread(&buffer, 1, sizeof(buffer), pstate->stream);
+        if (count < sizeof(buffer))
+        {
+            error_msg = eof_or_error_msg(pstate->stream);
+        }
+        instance->e1 = be32toh(*((uint32_t *)(&buffer)));
+    }
+    if (!error_msg)
+    {
+        error_msg = c2_parseSelf(&instance->c2, pstate);
+    }
+    return error_msg;
+}
+
+static const char *
+c1_unparseSelf(const c1 *instance, const UState *ustate)
+{
+    const char *error_msg = NULL;
+    if (!error_msg)
+    {
+        union
+        {
+            char     c_val[4];
+            uint32_t i_val;
+        } buffer;
+        buffer.i_val = htobe32(instance->e1);
+        size_t count = fwrite(buffer.c_val, 1, sizeof(buffer), ustate->stream);
+        if (count < sizeof(buffer))
+        {
+            error_msg = eof_or_error_msg(ustate->stream);
+        }
+    }
+    if (!error_msg)
+    {
+        error_msg = c2_unparseSelf(&instance->c2, ustate);
+    }
+    return error_msg;
+}

Review comment:
       This C source file's directory and name now clearly labels it as a generated code example.  You can link it with the files in the `c` directory to debug and test any part of the code.  Over time, we'll create more generated code examples and put them in this directory too.

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -728,6 +764,31 @@ object Main extends Logging {
     pf
   }
 
+  def createGeneratorFromSchema(schema: URI, rootNS: Option[RefQName], tunables: Map[String, String],
+                                language: String): Option[DFDL.CodeGenerator] = {
+    val compiler = {
+      val c = Compiler().withTunables(tunables)
+      rootNS match {
+        case None => c
+        case Some(RefQName(_, root, ns)) => c.withDistinguishedRootNode(root, ns.toStringOrNullIfNoNS)
+      }
+    }
+
+    val schemaSource = URISchemaSource(schema)
+    val cg = Timer.getResult("compiling", {
+      val processorFactory = compiler.compileSource(schemaSource)
+      if (!processorFactory.isError) {
+        val generator = processorFactory.forLanguage(language)
+        displayDiagnostics(generator)
+        Some(generator)
+      } else {
+        displayDiagnostics(processorFactory)
+        None
+      }
+    })
+    cg
+  }
+

Review comment:
       Creating a generator from a schema is now similar to creating a processor from a schema; both use the same factory.  You compile the schema first to get a factory, then you call `processorFactory.forLanguage(language)` to get a code generator instead of calling `processorFactory.onPath(path)` to get a data processor.

##########
File path: daffodil-propgen/src/main/resources/org/apache/daffodil/xsd/dafext.xsd
##########
@@ -380,6 +380,14 @@
             </xs:documentation>
           </xs:annotation>
         </xs:element>
+        <xs:element name="runtime" type="xs:string" default="runtime1" minOccurs="0">
+          <xs:annotation>
+            <xs:documentation>
+              Runtime implementation to use when running daffodil parse or unparse.
+              Allowed values are "runtime1" (default) and "runtime2".
+            </xs:documentation>
+          </xs:annotation>
+        </xs:element>

Review comment:
       I will remove this tunable since nothing uses it and I've realized we don't need or want it.

##########
File path: daffodil-runtime2/src/main/resources/.clang-format
##########
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+AlignConsecutiveDeclarations: true
+AllowShortFunctionsOnASingleLine: None
+AlwaysBreakAfterReturnType: TopLevelDefinitions
+BasedOnStyle: llvm
+BreakBeforeBraces: Allman
+IndentWidth: 4
+KeepEmptyLinesAtTheStartOfBlocks: false

Review comment:
       This clang-format file should be kept in source control so developers can format the C source files consistently.

##########
File path: daffodil-runtime2/src/main/resources/.vscode/launch.json
##########
@@ -0,0 +1,45 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+{
+    // Use IntelliSense to learn about possible attributes.
+    // Hover to view descriptions of existing attributes.
+    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
+    "version": "0.2.0",
+    "configurations": [
+
+        {
+            "name": "Debug daffodil",
+            "type": "cppdbg",
+            "request": "launch",
+            "program": "${workspaceFolder}/daffodil",
+            "args": ["parse", "../../test/resources/org/apache/daffodil/runtime2/parse_int32"],
+            "stopAtEntry": false,
+            "cwd": "${workspaceFolder}",
+            "environment": [],
+            "externalConsole": false,
+            "MIMode": "gdb",
+            "setupCommands": [
+                {
+                    "description": "Enable pretty-printing for gdb",
+                    "text": "-enable-pretty-printing",
+                    "ignoreFailures": true
+                }
+            ],
+            "preLaunchTask": "Build daffodil",
+            "miDebuggerPath": "/usr/bin/gdb"
+        }
+    ]
+}

Review comment:
       Now that I think about it, we probably shouldn't keep this IDE-specific file or the other IDE-specific file below in source control, however.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/CodeGenerator.scala
##########
@@ -0,0 +1,177 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.nio.file.FileSystems
+import java.nio.file.Files
+import java.nio.file.Paths
+import java.util.Collections
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.Diagnostic
+import org.apache.daffodil.dsom.Root
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import org.apache.daffodil.xml.RefQName
+
+/**
+ * Generates and compiles C source files from a DFDL schema encapsulated in a [[Root]].
+ * Implements the [[DFDL.CodeGenerator]] trait to allow it to be called by Daffodil code.
+ * @param root Provides the DFDL schema for code generation
+ */
+class CodeGenerator(root: Root) extends DFDL.CodeGenerator {
+  // Used by compileCode and pickCompiler methods
+  private lazy val isWindows = System.getProperty("os.name").toLowerCase().startsWith("windows")
+  // Used by WithDiagnostics methods
+  private var diagnostics: Seq[Diagnostic] = Nil
+  private var errorStatus: Boolean = false
+
+  // Writes C source files into a "c" subdirectory of the given output directory.
+  // Removes the "c" subdirectory if it existed before.
+  override def generateCode(rootNS: Option[RefQName], outputDirArg: String): os.Path = {
+    // Get the paths of the output directory and its code subdirectory
+    val outputDir = os.Path(Paths.get(outputDirArg).toAbsolutePath)
+    val codeDir = outputDir/"c"
+
+    // Ensure our output directory exists while our code subdirectory does not
+    os.makeDir.all(outputDir)
+    os.remove.all(codeDir)
+
+    // Copy our resource directory and all its C source files to our code subdirectory
+    val resourceUri = Misc.getRequiredResource("/c")
+    val fileSystem = if (resourceUri.getScheme == "jar")
+      FileSystems.newFileSystem(resourceUri, Collections.emptyMap(), null)
+    else
+      null
+    try {
+      val resourceDir = os.Path(if (fileSystem != null) fileSystem.getPath("/c") else Paths.get(resourceUri))
+      os.copy(resourceDir, codeDir)
+    }
+    finally
+      if (fileSystem != null) fileSystem.close()
+
+    // Generate C code from the DFDL schema
+    val rootElementName = rootNS.getOrElse(root.refQName).local
+    val codeGeneratorState = new CodeGeneratorState()
+    Runtime2CodeGenerator.generateCode(root.document, codeGeneratorState)
+    val codeHeaderText = codeGeneratorState.generateCodeHeader
+    val codeFileText = codeGeneratorState.generateCodeFile(rootElementName)
+
+    // Write the generated C code into our code subdirectory
+    val generatedCodeHeader = codeDir/"generated_code.h"
+    val generatedCodeFile = codeDir/"generated_code.c"
+    os.write(generatedCodeHeader, codeHeaderText)
+    os.write(generatedCodeFile, codeFileText)
+
+    // Return our output directory in case caller wants to call compileCode next
+    outputDir
+  }
+
+  // Compiles any C source files inside a "c" subdirectory of the given output directory.
+  // Returns the path of the newly created executable to use in TDML tests or something else.
+  override def compileCode(outputDir: os.Path): os.Path = {
+    // Get the paths of the code subdirectory and the executable we will build
+    val codeDir = outputDir/"c"
+    val exe = if (isWindows) codeDir/"daffodil" else codeDir/"daffodil.exe"
+
+    try {
+      // Assemble the compiler's command line arguments
+      val compiler = pickCompiler
+      val files = os.list(codeDir).filter(_.ext == "c")
+      val libs = Seq("-lmxml", if (isWindows) "-largp" else "-lpthread")
+
+      // Call the compiler if it was found.  We run the compiler in the output directory,
+      // not in the "c" subdirectory, in order to let the compiler (which might be "zig cc")
+      // cache/reuse previously built files (which might be in a "zig_cache" subdirectory).
+      // We can't let "zig_cache" be put into "c" because we always remove and re-generate
+      // everything in "c" from scratch.
+      if (compiler.nonEmpty) {
+        val result = os.proc(compiler, "-I", codeDir, files, libs, "-o", exe).call(cwd = outputDir, stderr = os.Pipe)
+
+        // Report any compiler output as a warning
+        if (result.out.text.nonEmpty || result.err.text.nonEmpty) {
+          warning("Unexpected compiler output on stdout: %s on stderr: %s", result.out.text, result.err.text)
+        }
+      }
+    } catch {
+      // Report any subprocess termination error as an error
+      case e: os.SubprocessException =>
+        error("Error compiling generated code: %s wd: %s", Misc.getSomeMessage(e).get, outputDir.toString)
+    }
+
+    // Report any failure to build the executable as an error
+    if (!os.exists(exe)) error("No executable was built: %s", exe.toString)
+    exe
+  }
+
+  /**
+   * Searches for any available C compiler on the system.  Tries to find the
+   * compiler given by `CC` if `CC` exists in the environment, then tries to
+   * find any compiler from the following list:
+   *
+   *   - zig cc
+   *   - gcc
+   *   - clang
+   *   - cc
+   *
+   * Returns the first compiler found as a sequence of strings in case the
+   * compiler is a program with a subcommand argument.  Returns the empty
+   * sequence if no compiler could be found in the user's PATH.
+   */
+  lazy val pickCompiler: Seq[String] = {
+    val ccEnv = System.getenv("CC")
+    val compilers = Seq(ccEnv, "zig cc", "gcc", "clang", "cc")
+    val path = System.getenv("PATH").split(File.pathSeparatorChar)
+    def inPath(compiler: String): Boolean = {
+      (compiler != null) && {
+        val exec = compiler.takeWhile(_ != ' ')
+        val exec2 = exec + ".exe"
+        path.exists(dir => Files.isExecutable(Paths.get(dir, exec))
+          || (isWindows && Files.isExecutable(Paths.get(dir, exec2))))
+      }
+    }
+    val compiler = compilers.find(inPath)
+    if (compiler.isDefined)
+      compiler.get.split(' ').toSeq
+    else
+      Seq.empty[String]
+  }
+
+  /**
+   * Adds a warning message to the diagnostics
+   */
+  def warning(formatString: String, args: Any*): Unit = {
+    val sde = new SchemaDefinitionError(None, None, formatString, args: _*)
+    diagnostics :+= sde
+  }
+
+  /**
+   * Adds an error message to the diagnostics and sets isError true
+   */
+  def error(formatString: String, args: Any*): Unit = {
+    val sde = new SchemaDefinitionError(None, None, formatString, args: _*)
+    diagnostics :+= sde
+    errorStatus = true
+  }
+
+  // Implements the WithDiagnostics methods
+  override def getDiagnostics: Seq[Diagnostic] = diagnostics
+  override def isError: Boolean = errorStatus
+}

Review comment:
       This is the core functionality that enables `daffodil generate c -s <schema> <outdir>`.  It also builds an executable to be run by TDML tests.  If you install [zig cc](https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html#:~:text=Install%20simply%20by%20unzipping%20a,%2C%20and%20you're%20done.) as your compiler, you will be able to run a large suite of TDML tests more quickly because the `zig cc` frontend will cache the object files compiled from the static C source files and compile only the generated code whenever it changes.

##########
File path: daffodil-runtime2/src/main/resources/c/infoset.h
##########
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef INFOSET_H
+#define INFOSET_H
+
+#include <stddef.h> // for ptrdiff_t
+#include <stdint.h> // for int32_t
+#include <stdio.h>  // for FILE, size_t
+
+// Prototypes needed for compilation
+
+struct ElementRuntimeData;
+struct InfosetBase;
+struct PState;
+struct UState;
+struct VisitEventHandler;
+
+typedef struct ElementRuntimeData ERD;
+typedef struct InfosetBase        InfosetBase;
+typedef struct PState             PState;
+typedef struct UState             UState;
+typedef struct VisitEventHandler  VisitEventHandler;
+
+typedef void (*ERDInitSelf)(InfosetBase *infoNode);
+typedef const char *(*ERDParseSelf)(InfosetBase * infoNode,
+                                    const PState *pstate);
+typedef const char *(*ERDUnparseSelf)(const InfosetBase *infoNode,
+                                      const UState *     ustate);
+
+typedef const char *(*VisitStartDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitEndDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitStartComplex)(const VisitEventHandler *handler,
+                                         const InfosetBase *      base);
+typedef const char *(*VisitEndComplex)(const VisitEventHandler *handler,
+                                       const InfosetBase *      base);
+typedef const char *(*VisitInt32Elem)(const VisitEventHandler *handler,
+                                      const ERD *erd, const int32_t *location);
+
+// NamedQName - name of an infoset element
+
+typedef struct NamedQName
+{
+    const char *prefix; // prefix (optional, may be NULL)
+    const char *local;  // local name
+    const char *ns;     // namespace URI (optional, may be NULL)
+} NamedQName;
+
+// TypeCode - type of an infoset element
+
+enum TypeCode
+{
+    COMPLEX,
+    PRIMITIVE_INT32
+};
+
+// ERD - element runtime data needed to parse/unparse objects
+
+typedef struct ElementRuntimeData
+{
+    const NamedQName    namedQName;
+    const enum TypeCode typeCode;
+    const size_t        numChildren;
+    const ptrdiff_t *   offsets;
+    const ERD **        childrenERDs;
+
+    const ERDInitSelf    initSelf;
+    const ERDParseSelf   parseSelf;
+    const ERDUnparseSelf unparseSelf;
+} ERD;
+
+// InfosetBase - representation of an infoset element
+
+typedef struct InfosetBase
+{
+    const ERD *erd;
+} InfosetBase;
+
+// PState - parser state while parsing input
+
+typedef struct PState
+{
+    FILE *stream; // input to read from
+} PState;
+
+// UState - unparser state while unparsing infoset
+
+typedef struct UState
+{
+    FILE *stream; // output to write to
+} UState;
+
+// VisitEventHandler - methods to be called when walking an infoset
+
+typedef struct VisitEventHandler
+{
+    const VisitStartDocument visitStartDocument;
+    const VisitEndDocument   visitEndDocument;
+    const VisitStartComplex  visitStartComplex;
+    const VisitEndComplex    visitEndComplex;
+    const VisitInt32Elem     visitInt32Elem;
+} VisitEventHandler;
+
+// get_erd_name, get_erd_xmlns, get_erd_ns - get name and xmlns
+// attribute/value from ERD to use for XML element
+
+extern const char *get_erd_name(const ERD *erd);
+extern const char *get_erd_xmlns(const ERD *erd);
+extern const char *get_erd_ns(const ERD *erd);
+
+// rootElement - return a root element to walk while parsing or unparsing
+
+// (actual definition will be in generated_code.c, not infoset.c)
+extern InfosetBase *rootElement();
+
+// walkInfoset - walk an infoset and call VisitEventHandler methods
+
+extern const char *walkInfoset(const VisitEventHandler *handler,
+                               const InfosetBase *      infoset);
+
+// eof_or_error_msg - check if a stream has its eof or error indicator set
+
+extern const char *eof_or_error_msg(FILE *stream);
+
+#endif // INFOSET_H

Review comment:
       If you throw away the CLI and XML reader/writer, you're left with this infoset API and the actual generated code.

##########
File path: daffodil-runtime2/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {
+    val tempDir = os.temp.dir()
+    val infile = tempDir/"infile"
+    val outfile = tempDir/"outfile"
+    try {
+      os.write(infile, input)
+      val result = os.proc(executableFile, "parse", "-I", "xml", "-o", outfile, infile).call(cwd = tempDir, stderr = os.Pipe)
+      if (result.out.text.isEmpty && result.err.text.isEmpty) {
+        val parseResult = new ParseResult(outfile, Success)
+        parseResult
+      } else {
+        val msg = s"Unexpected daffodil output on stdout: ${result.out.text} on stderr: ${result.err.text}"
+        val parseError = new ParseError(Nope, Nope, Nope, Maybe(msg))
+        val parseResult = new ParseResult(outfile, Failure(parseError))
+        parseResult.addDiagnostic(parseError)
+        parseResult
+      }
+    } catch {
+      case e: os.SubprocessException =>
+        val parseError = if (e.result.out.text.isEmpty && e.result.err.text.isEmpty) {
+          new ParseError(Nope, Nope, Maybe(e), Nope)
+        } else {
+          val msg = s"${e.getMessage} with stdout: ${e.result.out.text} and stderr: ${e.result.err.text}"
+          new ParseError(Nope, Nope, Nope, Maybe(msg))
+        }
+        val parseResult = new ParseResult(outfile, Failure(parseError))
+        parseResult.addDiagnostic(parseError)
+        parseResult
+    }
+  }
+
+  /**
+   * Unparses (that is, serializes) data to the output, returns an object which contains any diagnostics.
+   */
+  def unparse(input: InputStream, output: OutputStream): UnparseResult = {
+    val tempDir = os.temp.dir()
+    val infile = tempDir/"infile"
+    val outfile = tempDir/"outfile"
+    try {
+      os.write(infile, input)
+      val result = os.proc(executableFile, "unparse", "-I", "xml", "-o", outfile, infile).call(cwd = tempDir, stderr = os.Pipe)
+      val finalBitPos0b = os.size(outfile) * 8 // File sizes are bytes, so must multiply to get final position in bits
+      os.read.stream(outfile).writeBytesTo(output)
+      if (result.out.text.isEmpty && result.err.text.isEmpty) {
+        val unparseResult = new UnparseResult(finalBitPos0b, Success)
+        unparseResult
+      } else {
+        val msg = s"Unexpected daffodil output on stdout: ${result.out.text} on stderr: ${result.err.text}"
+        val unparseError = new UnparseError(Nope, Nope, Nope, Maybe(msg))
+        val unparseResult = new UnparseResult(finalBitPos0b, Failure(unparseError))
+        unparseResult.addDiagnostic(unparseError)
+        unparseResult
+      }
+    } catch {
+      case e: os.SubprocessException =>
+        val unparseError = if (e.result.out.text.isEmpty && e.result.err.text.isEmpty) {
+          new UnparseError(Nope, Nope, Maybe(e), Nope)
+        } else {
+          val msg = s"${e.getMessage} with stdout: ${e.result.out.text} and stderr: ${e.result.err.text}"
+          new UnparseError(Nope, Nope, Nope, Maybe(msg))
+        }
+        val finalBitPos0b = 0L
+        val unparseResult = new UnparseResult(finalBitPos0b, Failure(unparseError))
+        unparseResult.addDiagnostic(unparseError)
+        unparseResult
+    }
+  }
+}
+
+object Runtime2DataLocation {
+  class Runtime2DataLocation(_isAtEnd: Boolean,
+                             _bitPos1b: Long,
+                             _bytePos1b: Long) extends DataLocation {
+    override def isAtEnd: Boolean = _isAtEnd
+    override def bitPos1b: Long = _bitPos1b
+    override def bytePos1b: Long = _bytePos1b
+  }
+
+  def apply(isAtEnd: Boolean = true,
+            bitPos1b: Long = 0L,
+            bytePos1b: Long = 0L): DataLocation = {
+    new Runtime2DataLocation(isAtEnd, bitPos1b, bytePos1b)
+  }
+}
+
+final class ParseResult(outfile: os.Path,
+                        override val processorStatus: ProcessorResult,
+                        loc: DataLocation = Runtime2DataLocation())
+  extends DFDL.ParseResult
+    with DFDL.State
+    with WithDiagnosticsImpl {
+
+  override def resultState: DFDL.State = this
+
+  override def validationStatus: Boolean = processorStatus.isSuccess
+
+  override def currentLocation: DataLocation = loc
+
+  def infosetAsXML : scala.xml.Elem = {
+    val xml = scala.xml.XML.loadFile(outfile.toIO)
+    xml
+  }
+}
+
+final class UnparseResult(val finalBitPos0b: Long,
+                          override val processorStatus: ProcessorResult,
+                          loc: DataLocation = Runtime2DataLocation())
+  extends DFDL.UnparseResult
+    with DFDL.State
+    with WithDiagnosticsImpl {
+  /**
+   * Data is 'scannable' if it consists entirely of textual data, and that data
+   * is all in the same encoding.
+   */
+  override def isScannable: Boolean = false // Safest answer since we don't know for sure
+
+  override def encodingName: String = ??? // We don't need encoding unless isScannable is true
+
+  override def validationStatus: Boolean = processorStatus.isSuccess
+
+  override def currentLocation: DataLocation = loc
+
+  override def resultState: DFDL.State = this
+}

Review comment:
       This is another key piece of functionality allowing TDML tests to use the executable built by the code generator.

##########
File path: daffodil-cli/src/main/scala/org/apache/daffodil/Main.scala
##########
@@ -543,11 +542,48 @@ class CLIConf(arguments: Array[String]) extends scallop.ScallopConf(arguments)
     val info = tally(descr = "increment test result information output level, one level for each -i")
   }
 
+  // Generate Subcommand Options
+  val generate = new scallop.Subcommand("generate") {
+    descr("generate <language> code from a DFDL schema")
+
+    banner("""|Usage: daffodil [GLOBAL_OPTS] generate <language> [SUBCOMMAND_OPTS]
+              |""".stripMargin)
+    shortSubcommandsHelp()
+    footer("""|
+              |Run 'daffodil generate <language> --help' for subcommand specific options""".stripMargin)
+
+    val c = new scallop.Subcommand("c") {
+      banner("""|Usage: daffodil generate c -s <schema> [-r [{namespace}]<root>]
+                |                           [-c <file>] [outputDir]
+                |
+                |Generate C code from a DFDL schema to parse or unparse data
+                |
+                |Generate Options:""".stripMargin)
+
+      descr("generate C code from a DFDL schema")
+      helpWidth(76)
+
+      val language = "c"
+      val schema = opt[URI]("schema", required = true, argName = "file", descr = "the annotated DFDL schema to use to generate source code.")
+      val rootNS = opt[RefQName]("root", argName = "node", descr = "the root element of the XML file to use.  An optional namespace may be provided. This needs to be one of the top-level elements of the DFDL schema defined with --schema. Requires --schema. If not supplied uses the first element of the first schema")
+      val tunables = props[String]('T', keyName = "tunable", valueName = "value", descr = "daffodil tunable to be used when compiling schema.")
+      val config = opt[String](short = 'c', argName = "file", descr = "path to file containing configuration items.")
+      val outputDir = trailArg[String](required = false, descr = "output directory in which to generate source code. If not specified, uses current directory.")
+
+      validateOpt(schema) {
+        case None => Left("No schemas specified using the --schema option")
+        case _ => Right(Unit)
+      }
+    }
+    addSubcommand(c)
+  }
+

Review comment:
       I like how easy these changes turned out to be.  By using two subcommands in series one after another, we can generate code for multiple languages later even though we support only C at this time, and each language can have different code generator options if necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] bsloane1650 commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
bsloane1650 commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498457789



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/GrammarTerm.scala
##########
@@ -51,7 +52,8 @@ import org.apache.daffodil.dsom.Term
 abstract class Gram(contextArg: SchemaComponent)
   extends OOLAGHostImpl(contextArg)
   with BasicComponent
-  with GramRuntime1Mixin {
+  with GramRuntime1Mixin
+  with GramRuntime2Mixin {

Review comment:
       This strikes me as a functional vs object-orientiented distinction.
   
   In the functional view, the grammar objects are just a data structure. By definition the "implementation" is in functions which are defined seperately. It is then a purely organizational question if those functions should be defined near where the data type is defined.
   
   I don't think there is much glue logic to be dealt with.
   
   Setting aside the possibility of typeclasses, all we are talking about is a single parse() function with a match structure. This is a pattern we already use throughout the compiler. At least to me, the functions defined with this pattern are far easier to read, as all of the different versions of those functions can be found in one place; instead of needing to go through all of the different objects where different implementations of said functions are located.
   
   Needing to maintain the "parser" function when its implementation is scattered throughout the code seems much more tedious than needing to update it whenever a grammar object is added/removed (which happens far less frequently than implementation changes).
   
   With the addition of Runtime2; we are now putting parser logic, code generation logic, and abstract grammar logic all in the same spot. 
   
   I agree that it may not be worth refactoring Runtime1. But we do not yet have any intertia behind Runtime2; so this is the best time to change how we implement it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498289830



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/GrammarTerm.scala
##########
@@ -51,7 +52,8 @@ import org.apache.daffodil.dsom.Term
 abstract class Gram(contextArg: SchemaComponent)
   extends OOLAGHostImpl(contextArg)
   with BasicComponent
-  with GramRuntime1Mixin {
+  with GramRuntime1Mixin
+  with GramRuntime2Mixin {

Review comment:
       I think it's worth having a discussion to figure out how to separate the runtime logic from the compile logic. Having to modify Grammar/compiler logic everytime we want to add a new runtime is pretty painful. I'm not sure what that would look like, but having a generic solution would be nice.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import os.Pipe
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+  /**
+   * Returns a data processor with all the same state, but the validation mode changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is performed.
+   */
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {

Review comment:
       This feels odd to me. I would expect the runtime2 to not actually have a parse method. It would just generate an executable/library and users would have to know how to use that. So the API sort of changes depending on what backends are made avialable. We may need to rethink how things like the API and CLI work. For example, if this runtime2 backend were on the class path, you might expect something like this to be available:
   
   daffodil runtime2 -s schema.dfdl.xsd -o schema.bin
   
   This would generate a schema.bin executable that could then be used to do the actual parse unparse. So the concept of a DataProcessor in runtime2 migh not exist. And it would be purely a runtime1 concept?
   
   The fact that so many of these methods are unimplemented says to be me need to put some thought into how these backends will work in practice.

##########
File path: daffodil-cli/build.sbt
##########
@@ -42,6 +42,18 @@ mappings in Universal ++= Seq(
   baseDirectory.value / "README.md" -> "README.md",
 )
 
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "common_runtime.h" -> "include/common_runtime.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "daffodil_argp.h" -> "include/daffodil_argp.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "stack.h" -> "include/stack.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_reader.h" -> "include/xml_reader.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_writer.h" -> "include/xml_writer.h",
+)
+
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "target" / "streams" / "compile" / "ccTargetMap" / "_global" / "streams" / "compile" / "sbtcc.Library" / "libruntime2.a" -> "lib/libruntime2.a",

Review comment:
       The mappings in Universal determines how we package the convenience binaries that we release in zip/tar/rpm form. So this is saying to install the header files from the runtime2 directory into ``include/`` and the compiled static library into ``lib/``. You're right that explicitly mentioning target is uncommon in an sbt file. I would expect the cc plugin to make some property available to get access to the statically lib. Also, like above, we should be able to use a glob for these headers.

##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)

Review comment:
       Not an issue with this change per se, but I'm wondering if we should think about better names for our runtimes, as well as for all our subprojects. For example, the difference between daffodil-core and daffodil-lib is not obvious. Perhaps core should be something like "daffodil-schema-compiler". Likewise "runtime1" is not clear at all. Perhaps this should be "daffodil-runtime-scala-parser". It's verbose, but much more obvious from newcomers.
   
   I would even argue perhaps "runtime" isn't that good of a name. Especially in the context of this new code generator, where the runtime is the generated code, not this code generator. Maybe we something more generic like "backend". So projects might be daffodil-backend-scala-parser, daffodil-backend-scala-unparser, daffodil-backend-generator-c, etc. And eventually we will have the capability to plugin different backends to our compiler.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {

Review comment:
       I would expect all this logic to be in the runtime2 subproject. So runtime is essentially made up of jar that generates C code and compiles it with the static library. Perhaps this library could even live inside the jar and extracted for the during compilation process? So there would be no need to actually distribute the static lib in the zip/tar/rpm since it's all part of the jar? 

##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)
+                              .enablePlugins(CcPlugin)
+                              .dependsOn(tdmlProc)
+                              .settings(commonSettings)
+                              .settings(publishArtifact in (Compile, packageDoc) := false)
+                              .settings(
+                                Compile / ccTargets := ListSet(runtime2StaticLib),
+                                Compile / cSources  := Map(
+                                  runtime2StaticLib -> Seq(
+                                    baseDirectory.value / "src" / "main" / "c" / "common_runtime.c",

Review comment:
       Scala has a way to essentially do a glob listing of a directory, we probably want something like that.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {
+  private var executableFile: os.Path = _
+  private lazy val isWindows = System.getProperty("os.name").toLowerCase().startsWith("windows")
+
+  /**
+   * Compiles and links generated C code with runtime2 library to
+   * build an executable file.
+   */
+  def compile(rootElementName: String, codeGeneratorState: CodeGeneratorState): Unit = {
+    val compiler = "cc"
+    val location = Option(this.getClass.getProtectionDomain.getCodeSource) flatMap (x => Option(x.getLocation))
+    val wd = if (os.exists(os.pwd/"daffodil-runtime2"))
+      os.pwd
+    else if (os.exists(os.pwd/os.up/"daffodil-runtime2"))
+      os.pwd/os.up
+    else if (location.isDefined)
+      os.Path(FileUtils.toFile(location.get))/os.up/os.up
+    else
+      os.pwd
+    val includeDir = if (os.exists(wd/"include"))
+      wd/"include"
+    else
+      wd/"daffodil-runtime2"/"src"/"main"/"c"
+    val libDir = if (os.exists(wd/"lib"))
+      wd/"lib"
+    else
+      wd/"daffodil-runtime2"/"target"/"streams"/"compile"/"ccTargetMap"/"_global"/"streams"/"compile"/"sbtcc.Library"

Review comment:
       The sbt-cc plugin doesn't know about these generated c files. I think the generator doesn't even necessarily have to run in the context of sbt. So this new runtime needs to know how to compile things at runtime using some other compiler. We almost certainly don't want to have this sbt path in here, with the expectation that the static library is made available somehow when the runtime2 is used.

##########
File path: daffodil-runtime2/src/main/c/.gitignore
##########
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Ignore in case we compile the C files by hand
+*.a
+*.o
+daffodil
+
+# Ignore since these files are not fit for source control
+GPATH
+GRTAGS
+GTAGS

Review comment:
       I generally prefer we only modify the root .gitignore file. It's too easy to forget about .gitignore files in subdirectories which I think can sometimes lead to confusion.
   
   What are the daffodil a G* files? It also feels like all these files should be built in a ``target`` directory so that they should be ignored by default.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-701458598


   OK, the PR will now merge to branch `runtime2-2202` and the `wip-daffodil-2202-runtime` branch is gone.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-707832673


   Quick status update for everyone.  I paired with Mike on refactoring the CodeGenerator methods into a match/case expression and I completed these changes after our pairing session.  However, these changes introduced a code generation problem (one struct now has a member with a different name than it had before) which I haven't debugged yet since I've been spending all of my time on another project the last few days.  I have an all day meeting today, but tomorrow I'll debug that problem, commit the changes so far, and resume working on the rest of the requested changes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499591717



##########
File path: .github/workflows/main.yml
##########
@@ -20,81 +20,98 @@ on: [push, pull_request]
 jobs:
   test:
     name: Java ${{ matrix.java_version }}, Scala ${{ matrix.scala_version }}, ${{ matrix.os }}
-    runs-on: ${{ matrix.os }}
     strategy:
       fail-fast: false
       matrix:
-        java_version: [ '8.x', '9.x', '11.x' ]
-        scala_version: [ '2.12.11', '2.11.12' ]
-        os: [ 'ubuntu-latest', 'windows-latest' ]
+        java_version: [ 8, 9, 11 ]
+        scala_version: [ 2.12.11, 2.11.12 ]

Review comment:
       Yep. I'd avoid rebase until current discussions die down though. Sometimes github gets confused during a rebase and some discussions can be lost.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r497801797



##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element

Review comment:
       Hmmm. NOt sure you want to do this. This makes these things equivalent:
   ```
   <foo>43</foo>
   <foo>
      43
   </foo>
   ```
   From straight XML comparison perspective those do not have the same XML Infoset. They're equivalent only if ignoring whitespace which is an application decision. The slippery slope of XML equivalency says these are also equivalent 
   ```
   <foo>abc def</foo>
   <foo>abc            def</foo>
   ```
   because adjacent spaces are collapsed. This is all XML "markup language" stuff. When using XML as a data language we have to be more careful about whitespace.
   
   Do we have to handle elements with CDATA bracketing specifically with this mxml library?
   E.g., 
   ```
   <foo><![CDATA[43]]></foo>
   ```
   Or does that get handled by the library? What about entities like 
   ```
   "&#x20;"
   ```
    Does mxml convert those, or hand them through as entities for the application to deal with?
   
    




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence closed pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence closed pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499590947



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import os.Pipe
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+  /**
+   * Returns a data processor with all the same state, but the validation mode changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is performed.
+   */
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {

Review comment:
       A separate subcommand such as generate doesn'ts seem unreasonable. Though, I can imagine that if we ended up with another runtime that generated something else (eg. python, java, vhdl) that the generate command would take very different options, which could cause confusion. Have a separate command per runtime solves this, but I think maybe makes usability a bit harder, especially since the backend names aren't very obvious (runtime1) when compared to "parse/unparse".
   
   Maybe the best option is to just make our CLI runtime1 focused as Mike suggestes, but then we still need a way to execute runtime2. Maybe it's a separate CLI tool with a completely different name, and all new runtimes will get a unique CLI. This too might be worth a larger discussion on the dev list. How are people expected to use and trigger different backends




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499572354



##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)
+                              .enablePlugins(CcPlugin)
+                              .dependsOn(tdmlProc)
+                              .settings(commonSettings)
+                              .settings(publishArtifact in (Compile, packageDoc) := false)
+                              .settings(
+                                Compile / ccTargets := ListSet(runtime2StaticLib),
+                                Compile / cSources  := Map(
+                                  runtime2StaticLib -> Seq(
+                                    baseDirectory.value / "src" / "main" / "c" / "common_runtime.c",

Review comment:
       SBT has a pretty sophisticated way to match files. To do globbing, it looks something like this:
   ```scala
   val files = directory * GlobFilter("*.c").get
   ```
   Note that you need the .get to actually evaluate the paths/filters. A single astrisk will just look at children in the directory, a double asterisk (**) will recurse through all directories.
   
   Also note that rather doing ``baseDirectory.value`` and adding ``src/main/``, you can use ``sourceDirectory.value``, which is essentially ``baseDirectory.value / "src"``. Going one step further, if we get the setting from the ``Compile`` config like this ``(Compile / sourceDirectory).value`` we get the additional ``main`` directory. This compares to doing something like ``(Test / sourceDirectory).value`` which will get you ``src/test`` instead of ``src/main``. So the "sbt-y" way to glob source files is something like this:
   
   ```scala
   ((Compile / sourceDirectory).value / "c" / "common_runtime" * GlobFilter("*.c")).get
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-701419675


   Yes, there is much more work that needs to be done to generate C code for a bigger subset of DFDL types and schemas.  I would say enough work has been done on my fork and now it's time to get more eyes, discussion, and collaboration on the runtime2 backend by building out the rest of the C code generator in public instead of my fork.  However, I can't predict how much pushback I would get from Daffodil developers if their sbt build starts trying to compile C source files as well as Scala source files.  I think it makes more sense to let multiple developers collaborate and experiment with the runtime2 backend on a new branch until we reach broad agreement on how the runtime2 backend should look and how it should affect the main Daffodil build.  That's why I picked the name `wip-daffodil-2202-runtime2` for the new branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498502563



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import os.Pipe
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+  /**
+   * Returns a data processor with all the same state, but the validation mode changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is performed.
+   */
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {

Review comment:
       Arguably, the current CLI is a daffodil-runtime1 CLI, and a new runtime should create its own tools. If they can share library code great, but perhaps we don't even hook runtime2 into the existing CLI at all?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji merged pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji merged pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499785309



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/ElementCombinator.scala
##########
@@ -328,9 +351,21 @@ class ElementParseAndUnspecifiedLength(context: ElementBase, eBeforeGram: Gram,
       new ElementUnparserNoRep(context.erd, uSetVar)
     }
   }
+
+  override def generateCode(cgState: CodeGeneratorState): Unit = {
+    context.schemaDefinitionWhen(context.inputValueCalcOption.isDefined, "Elements with inputValueCalc are not supported.")
+    context.schemaDefinitionWhen(context.outputValueCalcOption.isDefined, "Elements with outputValueCalc are not supported.")
+    context.schemaDefinitionUnless(eBeforeGram.isEmpty, "Statements associated with elements are not supported.")
+    context.schemaDefinitionUnless(eAfterGram.isEmpty, "Statements associated with elements are not supported.")
+    context.schemaDefinitionUnless(repTypeElementGram.isEmpty, "dfdlx:repType is not supported.")
+
+    val elementContentGenerator = eGram // a Gram isA ParserGenerator
+    val e = new ElementParserGenerator(context, elementContentGenerator)
+    e.generateCode(cgState)
+  }

Review comment:
       I would get a clean commit, stick down a branch, then try it. Honestly I think it will be pretty simple given that only a few things have generateCode methods on them currently. Several are no-ops, the default that "errors out" becomes the default case of a match case in the one-true CCodeGenerator.generate(g: Gram, cgs: CodeGeneratorState): Unit Function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] bsloane1650 commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
bsloane1650 commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498423115



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/GrammarTerm.scala
##########
@@ -51,7 +52,8 @@ import org.apache.daffodil.dsom.Term
 abstract class Gram(contextArg: SchemaComponent)
   extends OOLAGHostImpl(contextArg)
   with BasicComponent
-  with GramRuntime1Mixin {
+  with GramRuntime1Mixin
+  with GramRuntime2Mixin {

Review comment:
       I think what we are looking for here is typeclasses. If the GramRuntime1Mixin was a typeclass, we could define it in the Runtime1 package; and provide an implementation in the Runtime1 package for all of the Gram objects while leaving the Gram objects themselves in the Core package.
   
   Unfourtuantly, typeclasses are not a first class citizen in Scala; but they can be emulated using implicits. https://scalac.io/typeclasses-in-scala/
   
   In the past, I have been dissapointed with this approach; but I believe that was largely a result of the implications with respect to boxing; which should not be an issue in this case.
   
   An alternative approach would be to just turn parser() and generateCode() from methods of Gram (through a mixing) into simple functions in Runtime1/Runtime2 that take Gram as an arguement.
   
   In either case, I would suggest that the way forward is that we implement Runtime2 within the Runtime2 package without adding a mixin to the Gram objects. Once that is done and we have a good feel for the method; we can discuss if it is worth refactoring Runtime1 as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499035314



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/ElementCombinator.scala
##########
@@ -328,9 +351,21 @@ class ElementParseAndUnspecifiedLength(context: ElementBase, eBeforeGram: Gram,
       new ElementUnparserNoRep(context.erd, uSetVar)
     }
   }
+
+  override def generateCode(cgState: CodeGeneratorState): Unit = {
+    context.schemaDefinitionWhen(context.inputValueCalcOption.isDefined, "Elements with inputValueCalc are not supported.")
+    context.schemaDefinitionWhen(context.outputValueCalcOption.isDefined, "Elements with outputValueCalc are not supported.")
+    context.schemaDefinitionUnless(eBeforeGram.isEmpty, "Statements associated with elements are not supported.")
+    context.schemaDefinitionUnless(eAfterGram.isEmpty, "Statements associated with elements are not supported.")
+    context.schemaDefinitionUnless(repTypeElementGram.isEmpty, "dfdlx:repType is not supported.")
+
+    val elementContentGenerator = eGram // a Gram isA ParserGenerator
+    val e = new ElementParserGenerator(context, elementContentGenerator)
+    e.generateCode(cgState)
+  }

Review comment:
       Mike, please let me know if you think it would be easier to keep these methods and remove GramRuntime2Mixin at a later time, or if you think you will have time to pair with me to help me refactor these methods to a function in daffodil-backend-generator-c.  Unless you can point me to some example code in Daffodil, I doubt it will be obvious to me how to walk the gram objects and come across the corresponding ElementParseAndUnspecifiedLength gram object so I can move the code above to a match expression inside the new function.

##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)
+                              .enablePlugins(CcPlugin)
+                              .dependsOn(tdmlProc)
+                              .settings(commonSettings)
+                              .settings(publishArtifact in (Compile, packageDoc) := false)
+                              .settings(
+                                Compile / ccTargets := ListSet(runtime2StaticLib),
+                                Compile / cSources  := Map(
+                                  runtime2StaticLib -> Seq(
+                                    baseDirectory.value / "src" / "main" / "c" / "common_runtime.c",

Review comment:
       OK, let's choose a time when to move every file that goes into the static lib into their own subdirectory and then perform a glob listing of that directory in the sbt build.sbt file.  I'd appreciate help from someone more familiar with sbt anyway.

##########
File path: .github/workflows/main.yml
##########
@@ -20,81 +20,98 @@ on: [push, pull_request]
 jobs:
   test:
     name: Java ${{ matrix.java_version }}, Scala ${{ matrix.scala_version }}, ${{ matrix.os }}
-    runs-on: ${{ matrix.os }}
     strategy:
       fail-fast: false
       matrix:
-        java_version: [ '8.x', '9.x', '11.x' ]
-        scala_version: [ '2.12.11', '2.11.12' ]
-        os: [ 'ubuntu-latest', 'windows-latest' ]
+        java_version: [ 8, 9, 11 ]
+        scala_version: [ 2.12.11, 2.11.12 ]

Review comment:
       Shall I create a bug issue and remove support for Scala 2.11 in a separate PR, then rebase my branch to catch up with asf/master again?

##########
File path: daffodil-runtime2/src/main/c/daffodil_argp.c
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "daffodil_argp.h" // for daffodil_cli, daffodil_parse_cli, ...
+#include <argp.h>          // for argp_state, argp_error, error_t, argp_parse
+#include <stdio.h>         // for sprintf
+#include <stdlib.h>        // for putenv, NULL
+#include <string.h>        // for strlen, strcmp
+
+// Initialize our "daffodil" name and version
+
+const char *argp_program_version = "Apache Daffodil (runtime2) 0.1";
+
+// Initialize our "daffodil parse" CLI options
+
+struct daffodil_parse_cli daffodil_parse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option parse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to output. Must be one of 'xml' or 'null'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to a given file. If not given or is -, output is written to "
+     "stdout"},
+
+    {0}};
+
+static error_t parse_handler(int key, char *arg, struct argp_state *state);
+
+static const char parse_args_doc[] = "[infile]";
+
+static const char parse_doc[] =
+    "\n"
+    "Parse a file using a DFDL schema\n"
+    "\n"
+    "Parse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to parse. "
+    "If not specified, or a value of -, reads from stdin";
+
+static const struct argp parse_argp = {
+    parse_options,  // array of CLI options
+    parse_handler,  // function to get these CLI options
+    parse_args_doc, // short usage documentation
+    parse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil parse" CLI options
+
+static error_t
+parse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_parse_cli *parse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        parse->infoset_type = arg;
+        break;
+
+    case 'o':
+        parse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        parse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil parse" command line interface
+
+static error_t
+parse_daffodil_parse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" parse") + 1];
+
+    sprintf(new_cmd, "%s parse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&parse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_parse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil unparse" CLI options
+
+struct daffodil_unparse_cli daffodil_unparse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option unparse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to unparse. Must be 'xml'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to file. If not given or is -, output is written to "
+     "standard output"},
+
+    {0}};
+
+static error_t unparse_handler(int key, char *arg, struct argp_state *state);
+
+static const char unparse_args_doc[] = "[infile]";
+
+static const char unparse_doc[] =
+    "\n"
+    "Unparse an infoset file using a DFDL schema\n"
+    "\n"
+    "Unparse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to unparse. If not specified, or "
+    "a value of -, reads from stdin";
+
+static const struct argp unparse_argp = {
+    unparse_options,  // array of CLI options
+    unparse_handler,  // function to get these CLI options
+    unparse_args_doc, // short usage documentation
+    unparse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil unparse" CLI options
+
+static error_t
+unparse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_unparse_cli *unparse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        unparse->infoset_type = arg;
+        break;
+
+    case 'o':
+        unparse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        unparse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil unparse" command line interface
+
+static error_t
+parse_daffodil_unparse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" unparse") + 1];
+
+    sprintf(new_cmd, "%s unparse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&unparse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_unparse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil" CLI options
+
+struct daffodil_cli daffodil_cli = {
+    DAFFODIL_NONE, // default subcommand
+    0,             // default verbosity
+};
+
+static const struct argp_option daffodil_options[] = {
+    {"verbose", 'v', 0, 0, "Increment verbosity level, one level for each -v",
+     -1},
+
+    {0}};
+
+static error_t daffodil_handler(int key, char *arg, struct argp_state *state);
+
+static const char daffodil_args_doc[] = "<subcommand> [SUBCOMMAND_OPTION...]";
+
+static const char daffodil_doc[] =
+    "\n"
+    "Global Options:"
+    "\v"
+    "Subcommands:\n"
+    "  parse         Parse data to a DFDL infoset\n"
+    "  unparse       Unparse a DFDL infoset\n"
+    "\n"
+    "Run 'daffodil <subcommand> --help' for subcommand specific options";
+
+static const struct argp daffodil_argp = {
+    daffodil_options,  // array of CLI options
+    daffodil_handler,  // function to get these CLI options
+    daffodil_args_doc, // short usage documentation
+    daffodil_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil" CLI options
+
+static error_t
+daffodil_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_cli *daffodil = state->input;

Review comment:
       Sure, it's a local variable so we can name it "cli" instead of "daffodil".

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/ElementCombinator.scala
##########
@@ -280,8 +300,11 @@ case class CaptureValueLengthEnd(ctxt: ElementBase)
       new CaptureEndOfValueLengthUnparser(ctxt.erd)
     else
       new NadaUnparser(ctxt.erd)
-}
 
+  override def generateCode(state: CodeGeneratorState): Unit = {
+    // Not generating code here

Review comment:
       Actually, I will have to remove these generateCode() methods anyway once I delete GramRuntime2Mixin and its generateCode() method.  I'm not sure how easy it will be to do this refactoring, but I'd like to do it sooner rather than later.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {
+  private var executableFile: os.Path = _
+  private lazy val isWindows = System.getProperty("os.name").toLowerCase().startsWith("windows")
+
+  /**
+   * Compiles and links generated C code with runtime2 library to
+   * build an executable file.
+   */
+  def compile(rootElementName: String, codeGeneratorState: CodeGeneratorState): Unit = {
+    val compiler = "cc"
+    val location = Option(this.getClass.getProtectionDomain.getCodeSource) flatMap (x => Option(x.getLocation))
+    val wd = if (os.exists(os.pwd/"daffodil-runtime2"))
+      os.pwd
+    else if (os.exists(os.pwd/os.up/"daffodil-runtime2"))
+      os.pwd/os.up
+    else if (location.isDefined)
+      os.Path(FileUtils.toFile(location.get))/os.up/os.up
+    else
+      os.pwd
+    val includeDir = if (os.exists(wd/"include"))
+      wd/"include"
+    else
+      wd/"daffodil-runtime2"/"src"/"main"/"c"
+    val libDir = if (os.exists(wd/"lib"))
+      wd/"lib"
+    else
+      wd/"daffodil-runtime2"/"target"/"streams"/"compile"/"ccTargetMap"/"_global"/"streams"/"compile"/"sbtcc.Library"

Review comment:
       The logic indeed is overly complicated here in order to allow me to run a test from the IDE (in which case we need to look for the headers and static lib in sbt paths) as well as run a TDML test from the daffodil command line (in which case we need to look for the headers and static lib in daffodil/{include,lib}).  If we don't want to check sbt paths in the logic here, then we need to extract the headers and static lib (or headers and source files) from our jar at runtime.  In the latter case, we want to cache the static lib somewhere to avoid compiling it every time this method is called.

##########
File path: daffodil-cli/build.sbt
##########
@@ -42,6 +42,18 @@ mappings in Universal ++= Seq(
   baseDirectory.value / "README.md" -> "README.md",
 )
 
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "common_runtime.h" -> "include/common_runtime.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "daffodil_argp.h" -> "include/daffodil_argp.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "stack.h" -> "include/stack.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_reader.h" -> "include/xml_reader.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_writer.h" -> "include/xml_writer.h",
+)
+
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "target" / "streams" / "compile" / "ccTargetMap" / "_global" / "streams" / "compile" / "sbtcc.Library" / "libruntime2.a" -> "lib/libruntime2.a",

Review comment:
       Yes, the cc plugin does make a property available to get access to the static lib.  I was able to use that property when I had these mappings in Universal inside the daffodil-runtime2 project in the parent directory's build.sbt, but I wasn't able to figure out how to use that property when I moved the mappings to daffodil-cli/build.sbt here.  If anyone can look at <https://github.com/tnakamot/sbt-cc/blob/master/examples/packaging-with-universal/README.md> and figure out a way, I'd appreciate it.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {

Review comment:
       These members were named with "view" prefix due to an old method named "view" which returned a string of C code passed into the object.  Yes, "generate" would be a better prefix.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)

Review comment:
       Agreed, will avoid null pointers by using Option types and map { => }.

##########
File path: daffodil-core/src/test/scala/org/apache/daffodil/runtime2/TestGeneratedCodeCompiler.scala
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.ByteArrayInputStream
+
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.infoset.DIComplex
+import org.apache.daffodil.infoset.TestInfoset
+import org.apache.daffodil.util.Misc
+import org.apache.daffodil.util.SchemaUtils
+import org.junit.Test
+
+// Now that we can run TDML tests with runtime2, this test's remaining
+// value is for debugging of runtime2 components.
+class TestGeneratedCodeCompiler {
+
+  @Test
+  def compileRunParseInt32(): Unit = {
+    // Compile a DFDL schema to parse int32 numbers
+    val testSchema = SchemaUtils.dfdlTestSchema(
+        <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>,
+        <dfdl:format representation="binary" ref="GeneralFormat"/>,
+      <xs:element name="c1">
+        <xs:complexType>
+          <xs:sequence>
+            <xs:element name="e1" type="xs:int"/>
+            <xs:element name="c2">
+              <xs:complexType>
+                <xs:sequence>
+                  <xs:element name="e2" type="xs:int"/>

Review comment:
       We probably need a sequence containing a repeated element with minOccurs > 1.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/SequenceCombinator.scala
##########
@@ -91,6 +92,16 @@ class OrderedSequence(sq: SequenceTermBase, sequenceChildrenArg: Seq[SequenceChi
       }
     }
   }
+
+  override def generateCode(cgState: CodeGeneratorState): Unit = {
+    //
+    // To lift this draconian restriction, we have to
+    // generate code for each of the children, and combine them into a block
+    //
+    sq.schemaDefinitionUnless(sequenceChildren.length == 1, "Only a single child of a sequence is supported.")

Review comment:
       Hmm, our test does have a nested complex struct type with two elements both of which are 32-bit ints.  I can only think that Daffodil must have turned that complex type into something different than an OrderedSequence with two children.  Maybe it was because each element had its own definition and name (simple types e2 and e3) and we needed a simple type e2 in a sequence with minOccurs = 2 to create an OrderedSequence with two children.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns

Review comment:
       We are initializing static variables, which ensures missing members are initialized to null.  We can be more explicit and use complete initializers instead of short initializers if necessary.  

##########
File path: daffodil-core/src/test/scala/org/apache/daffodil/runtime2/TestGeneratedCodeCompiler.scala
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.ByteArrayInputStream
+
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.infoset.DIComplex
+import org.apache.daffodil.infoset.TestInfoset
+import org.apache.daffodil.util.Misc
+import org.apache.daffodil.util.SchemaUtils
+import org.junit.Test
+
+// Now that we can run TDML tests with runtime2, this test's remaining
+// value is for debugging of runtime2 components.
+class TestGeneratedCodeCompiler {
+
+  @Test
+  def compileRunParseInt32(): Unit = {
+    // Compile a DFDL schema to parse int32 numbers
+    val testSchema = SchemaUtils.dfdlTestSchema(
+        <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>,
+        <dfdl:format representation="binary" ref="GeneralFormat"/>,
+      <xs:element name="c1">
+        <xs:complexType>
+          <xs:sequence>
+            <xs:element name="e1" type="xs:int"/>
+            <xs:element name="c2">
+              <xs:complexType>
+                <xs:sequence>
+                  <xs:element name="e2" type="xs:int"/>
+                  <xs:element name="e3" type="xs:int"/>
+                </xs:sequence>
+              </xs:complexType>
+            </xs:element>
+          </xs:sequence>
+        </xs:complexType>
+      </xs:element>)
+    val schemaCompiler = Compiler()
+    val pf = schemaCompiler.compileNode(testSchema)
+    assert(!pf.isError, pf.getDiagnostics.map(_.getMessage()).mkString("\n"))
+    // Generate C code from the DFDL schema
+    val codeGeneratorState = pf.generateCode()
+    val generatedCodeCompiler = new GeneratedCodeCompiler(pf)
+    val rootElementName = "c1"
+    generatedCodeCompiler.compile(rootElementName, codeGeneratorState)
+    assert(!pf.isError, pf.getDiagnostics.map(_.getMessage()).mkString("\n"))
+    // Run the executable to parse int32 numbers
+    val dp = generatedCodeCompiler.dataProcessor
+    val b = Misc.hex2Bytes("000000010000000200000003")
+    val input = new ByteArrayInputStream(b)
+    val pr = dp.parse(input)
+    assert(!pr.isError && pf.getDiagnostics.isEmpty, pr.getDiagnostics.map(_.getMessage()).mkString("\n"))
+    // Create an internal Daffodil infoset from the XML file
+    val (infoset: DIComplex, _, tunables) = TestInfoset.testInfoset(testSchema, pr.infosetAsXML)
+    assert(infoset.hasVisibleChildren)
+    assert(infoset.erd.name == "c1")

Review comment:
       Good idea, I would like a way to validate the infoset against the schema.

##########
File path: daffodil-runtime2/src/main/c/daffodil_main.c
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "common_runtime.h" // for walkInfoset, InfosetBase, ERD, ...
+#include "daffodil_argp.h"  // for daffodil_cli, parse_daffodil_cli, ...
+#include "generated_code.h" // for rootInfoset

Review comment:
       I put the prototype for the rootInfoset() function in generate_code.h because generate_code.c implements that function.  The prototype is generic, however, so it could be in another header file or inlined here.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {
+    val structs = finalStructs.mkString("\n")
+    val header =
+      s"""#ifndef GENERATED_CODE_H
+         |#define GENERATED_CODE_H
+         |
+         |#include "common_runtime.h" // for InfosetBase
+         |#include <stdint.h>         // for int32_t
+         |
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |extern InfosetBase *rootInfoset();
+         |
+         |// Define some infoset structures
+         |
+         |$structs
+         |#endif // GENERATED_CODE_H
+         |""".stripMargin
+    header
+  }
+
+  def viewCodeFile(rootElementName: String): String = {
+    val prototypes = this.prototypes.mkString("\n")
+    val erds = this.erds.mkString("\n")
+    val finalImplementation = this.finalImplementation.mkString("\n")
+    val code =
+      s"""#include "generated_code.h"
+         |#include <endian.h> // for be32toh, htobe32
+         |#include <errno.h>  // for errno
+         |#include <stddef.h> // for ptrdiff_t
+         |#include <stdio.h>  // for NULL, fread, fwrite, size_t, feof, ferror, FILE
+         |#include <string.h> // for strerror
+         |
+         |// Prototypes needed for compilation
+         |
+         |$prototypes
+         |
+         |// Metadata singletons
+         |
+         |$erds
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |InfosetBase *
+         |rootInfoset()
+         |{
+         |    static $rootElementName    instance;
+         |    InfosetBase *root = &instance._base;
+         |    ${rootElementName}_ERD.initSelf(root);
+         |    return root;
+         |}
+         |
+         |// Methods to initialize, parse, and unparse infoset nodes
+         |
+         |static const char *

Review comment:
       Yes, eof_or_error_msg can be static code in libruntime2.a; I simply hadn't considered the possibility of it being defined outside of the generate_code.c file.

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala
##########
@@ -202,6 +204,14 @@ object DFDL {
     def parse(ab: Array[Byte]): Unit
   }
 
+  trait CodeGeneratorState {

Review comment:
       I also wasn't sure if we needed this either.  I put my best guess in a comment below this line why I thought it was there.

##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "common_runtime.h"
+
+// walkInfosetNode - recursively walk an infoset node and call
+// VisitEventHandler methods
+
+static const char *
+walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)

Review comment:
       Yes, I know sphinx and other similar tools exist.  Will look into using one of them.

##########
File path: daffodil-runtime2/src/main/c/common_runtime.h
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef COMMON_RUNTIME_H
+#define COMMON_RUNTIME_H
+
+#include <stddef.h> // for ptrdiff_t
+#include <stdint.h> // for int32_t
+#include <stdio.h>  // for FILE, size_t
+
+// Prototypes needed for compilation
+
+typedef struct ElementRuntimeData ERD;
+typedef struct InfosetBase        InfosetBase;
+typedef struct PState             PState;
+typedef struct UState             UState;
+typedef struct VisitEventHandler  VisitEventHandler;
+
+typedef void (*ERDInitSelf)(InfosetBase *infoNode);
+typedef const char *(*ERDParseSelf)(InfosetBase * infoNode,
+                                    const PState *pstate);
+typedef const char *(*ERDUnparseSelf)(const InfosetBase *infoNode,
+                                      const UState *     ustate);
+
+typedef const char *(*VisitStartDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitEndDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitStartComplex)(const VisitEventHandler *handler,
+                                         const InfosetBase *      base);
+typedef const char *(*VisitEndComplex)(const VisitEventHandler *handler,
+                                       const InfosetBase *      base);
+typedef const char *(*VisitInt32Elem)(const VisitEventHandler *handler,
+                                      const ERD *erd, const int32_t *location);
+
+// NamedQName - name of an infoset element
+
+typedef struct NamedQName
+{
+    char *name;  // element name (including prefix if any)
+    char *xmlns; // xmlns attribute name (including prefix if any)

Review comment:
       Yes, this part of the ERD was TDML runner and XML-conversion specific.  I'm fine with pushing back and asking to keep things needed only for TDML/XML separate.

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *number_from_xml = mxmlGetOpaque(reader->node);
+    const char *name_from_infoset = erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);

Review comment:
       Nope, unless you want to change other places besides this place.

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *number_from_xml = mxmlGetOpaque(reader->node);
+    const char *name_from_infoset = erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        if (strcmp(name_from_xml, name_from_infoset) == 0)
+        {
+            // Check for errors when reading the 32-bit integer
+            char *      endptr = NULL;
+            errno = 0; // To distinguish success/failure after call

Review comment:
       Sounds good, will use strtonum().

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element

Review comment:
       The whitespace being skipped is before the `<foo>43</foo>`, not inside the `<foo>` element itself.
   
   Not sure if we need to handle elements with CDATA bracketing specially.  We need to test and find out.  
   
   Entities are passed through as entities for the application to deal with.

##########
File path: daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/runtime2/Runtime2TDMLDFDLProcessor.scala
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.tdml.processor.runtime2
+
+import org.apache.daffodil.api._
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.runtime2.GeneratedCodeCompiler
+import org.apache.daffodil.runtime2.ParseResult
+import org.apache.daffodil.runtime2.UnparseResult
+import org.apache.daffodil.tdml.processor._
+import org.apache.daffodil.xml.XMLUtils
+
+import scala.xml.Node
+
+final class TDMLDFDLProcessorFactory private(
+  private var compiler: Compiler,
+  private var checkAllTopLevel: Boolean,
+  validateDFDLSchemasArg: Boolean)
+  extends AbstractTDMLDFDLProcessorFactory {
+
+  override def validateDFDLSchemas = validateDFDLSchemasArg
+
+  override type R = TDMLDFDLProcessorFactory
+
+  override def implementationName = "daffodil-runtime2"
+
+  def this() = this(compiler = Compiler(validateDFDLSchemas = true),
+    checkAllTopLevel = false,
+    validateDFDLSchemasArg = true)
+
+  private def copy(
+    compiler: Compiler = compiler,
+    checkAllTopLevel: Boolean = checkAllTopLevel,
+    validateDFDLSchemas: Boolean = validateDFDLSchemas) =
+    new TDMLDFDLProcessorFactory(compiler, checkAllTopLevel, validateDFDLSchemas)
+
+  /**
+   * Deprecated methods must be implemented. Some are just stubs though now.
+   */
+  @deprecated("Use withValidateDFDLSchemas.", "2.6.0")
+  override def setValidateDFDLSchemas(bool: Boolean): Unit = {
+    compiler = compiler.withValidateDFDLSchemas(bool)
+  }
+
+  override def withValidateDFDLSchemas(bool: Boolean): TDMLDFDLProcessorFactory = {
+    copy(compiler = compiler.withValidateDFDLSchemas(bool))
+  }
+
+  @deprecated("Use withCheckAllTopLevel.", "2.6.0")
+  override def setCheckAllTopLevel(checkAllTopLevel: Boolean): Unit = {
+    compiler = compiler.withCheckAllTopLevel(checkAllTopLevel)
+  }
+
+  override def withCheckAllTopLevel(checkAllTopLevel: Boolean): TDMLDFDLProcessorFactory = {
+    copy(compiler = compiler.withCheckAllTopLevel(checkAllTopLevel))
+  }
+
+  @deprecated("Use withTunables.", "2.6.0")
+  override def setTunables(tunables: Map[String, String]): Unit =
+    compiler = compiler.withTunables(tunables)
+
+  override def withTunables(tunables: Map[String, String]): TDMLDFDLProcessorFactory =
+    copy(compiler = compiler.withTunables(tunables))
+
+  @deprecated("Use DaffodilTDMLDFDLProcessor.setExternalDFDLVariables.", "2.6.0")
+  override def setExternalDFDLVariables(externalVarBindings: Seq[Binding]): Unit =
+    compiler = compiler.withExternalDFDLVariablesImpl(externalVarBindings)
+
+  override def withExternalDFDLVariables(externalVarBindings: Seq[Binding]): TDMLDFDLProcessorFactory =
+    copy(compiler = compiler.withExternalDFDLVariablesImpl(externalVarBindings))
+
+  @deprecated("Use arguments to getProcessor()", "2.6.0")
+  override def setDistinguishedRootNode(name: String, namespace: String): Unit =
+    compiler = compiler.withDistinguishedRootNode(name, namespace)
+
+  // We're doing to replace this method with different code.
+  // Return result is a TDML.CompileResult - so it's the result
+  // of compiling the schema for the test.
+  override def getProcessor(
+    schemaSource: DaffodilSchemaSource,
+    useSerializedProcessor: Boolean,
+    optRootName: Option[String] = None,
+    optRootNamespace: Option[String] = None): TDML.CompileResult = {
+    val pf = compiler.compileSource(schemaSource, optRootName, optRootNamespace)
+    val res = if (pf.isError) {
+      Left(pf.getDiagnostics) // DFDL schema compilation diagnostics
+    } else {
+      // How can we move some of these calls to ProcessorFactory with tunable runtime = "runtime2"?
+      val rootElementName = optRootName.getOrElse("FIXME")

Review comment:
       Sure, will add // FIXME as well.

##########
File path: daffodil-runtime2/src/main/c/daffodil_argp.h
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DAFFODIL_ARGP_H
+#define DAFFODIL_ARGP_H
+
+// Parse our "daffodil" command line interface
+
+extern int parse_daffodil_cli(int argc, char **argv);
+
+// Get our "daffodil" CLI options
+
+extern struct daffodil_cli
+{
+    enum daffodil_subcommand
+    {
+        DAFFODIL_NONE,
+        DAFFODIL_PARSE,
+        DAFFODIL_UNPARSE
+    } subcommand;
+    int verbosity;
+} daffodil_cli;
+
+// Get our "daffodil parse" CLI options
+
+extern struct daffodil_parse_cli
+{
+    const char *infoset_type;

Review comment:
       I was mirroring the "--infoset-type" CLI option's name but I'm fine with "external_infoset_rep".

##########
File path: daffodil-runtime2/src/main/c/.gitignore
##########
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# Ignore in case we compile the C files by hand
+*.a
+*.o
+daffodil
+
+# Ignore since these files are not fit for source control
+GPATH
+GRTAGS
+GTAGS

Review comment:
       I'll move the excludes to the root .gitignore file.
   
   The G* files are binary files generated by Visual Studio Code to keep track of where symbols are defined, and they get generated alongside the C source files by default.  I may not be able to change that, but we'll see.

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)

Review comment:
       The walkers aren't necessarily aligned with each other until this point.  Some mxmlWalkNext(...) calls above are executed conditionally only as needed to skip whitespace and then we need the last mxmlWalkNext(...) to skip past the element's name and set up the next walk.

##########
File path: daffodil-runtime2/src/main/c/generated_code.c
##########
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "generated_code.h"

Review comment:
       Correct, this file is an example and it was originally handcrafted.  Now it is identical to what gets generated if given the same schema.  However, it seems reasonable to change its name to generated_example1.c since we might want to use more examples later.

##########
File path: daffodil-runtime2/src/main/c/daffodil_argp.c
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "daffodil_argp.h" // for daffodil_cli, daffodil_parse_cli, ...
+#include <argp.h>          // for argp_state, argp_error, error_t, argp_parse
+#include <stdio.h>         // for sprintf
+#include <stdlib.h>        // for putenv, NULL
+#include <string.h>        // for strlen, strcmp
+
+// Initialize our "daffodil" name and version
+
+const char *argp_program_version = "Apache Daffodil (runtime2) 0.1";
+
+// Initialize our "daffodil parse" CLI options
+
+struct daffodil_parse_cli daffodil_parse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option parse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to output. Must be one of 'xml' or 'null'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to a given file. If not given or is -, output is written to "
+     "stdout"},
+
+    {0}};
+
+static error_t parse_handler(int key, char *arg, struct argp_state *state);
+
+static const char parse_args_doc[] = "[infile]";
+
+static const char parse_doc[] =
+    "\n"
+    "Parse a file using a DFDL schema\n"
+    "\n"
+    "Parse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to parse. "
+    "If not specified, or a value of -, reads from stdin";
+
+static const struct argp parse_argp = {
+    parse_options,  // array of CLI options
+    parse_handler,  // function to get these CLI options
+    parse_args_doc, // short usage documentation
+    parse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil parse" CLI options
+
+static error_t
+parse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_parse_cli *parse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        parse->infoset_type = arg;
+        break;
+
+    case 'o':
+        parse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        parse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil parse" command line interface
+
+static error_t
+parse_daffodil_parse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" parse") + 1];
+
+    sprintf(new_cmd, "%s parse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&parse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_parse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil unparse" CLI options
+
+struct daffodil_unparse_cli daffodil_unparse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option unparse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to unparse. Must be 'xml'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to file. If not given or is -, output is written to "
+     "standard output"},
+
+    {0}};
+
+static error_t unparse_handler(int key, char *arg, struct argp_state *state);
+
+static const char unparse_args_doc[] = "[infile]";
+
+static const char unparse_doc[] =
+    "\n"
+    "Unparse an infoset file using a DFDL schema\n"
+    "\n"
+    "Unparse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to unparse. If not specified, or "
+    "a value of -, reads from stdin";
+
+static const struct argp unparse_argp = {
+    unparse_options,  // array of CLI options
+    unparse_handler,  // function to get these CLI options
+    unparse_args_doc, // short usage documentation
+    unparse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil unparse" CLI options
+
+static error_t
+unparse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_unparse_cli *unparse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        unparse->infoset_type = arg;
+        break;
+
+    case 'o':
+        unparse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        unparse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil unparse" command line interface
+
+static error_t
+parse_daffodil_unparse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" unparse") + 1];
+
+    sprintf(new_cmd, "%s unparse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&unparse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_unparse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil" CLI options
+
+struct daffodil_cli daffodil_cli = {
+    DAFFODIL_NONE, // default subcommand
+    0,             // default verbosity
+};
+
+static const struct argp_option daffodil_options[] = {
+    {"verbose", 'v', 0, 0, "Increment verbosity level, one level for each -v",
+     -1},
+
+    {0}};
+
+static error_t daffodil_handler(int key, char *arg, struct argp_state *state);
+
+static const char daffodil_args_doc[] = "<subcommand> [SUBCOMMAND_OPTION...]";
+
+static const char daffodil_doc[] =
+    "\n"
+    "Global Options:"
+    "\v"
+    "Subcommands:\n"
+    "  parse         Parse data to a DFDL infoset\n"
+    "  unparse       Unparse a DFDL infoset\n"
+    "\n"
+    "Run 'daffodil <subcommand> --help' for subcommand specific options";
+
+static const struct argp daffodil_argp = {
+    daffodil_options,  // array of CLI options
+    daffodil_handler,  // function to get these CLI options
+    daffodil_args_doc, // short usage documentation
+    daffodil_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil" CLI options
+
+static error_t
+daffodil_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_cli *daffodil = state->input;
+    error_t              status = 0;
+
+    switch (key)
+    {
+    case 'v':
+        daffodil->verbosity++;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (strcmp(arg, "parse") == 0)
+        {
+            daffodil->subcommand = DAFFODIL_PARSE;
+            status = parse_daffodil_parse_cli(state);

Review comment:
       We are indeed using the argp library to shorten the code here (libarp's a part of the GNU C library).  Argp is much better than the getopt library for complicated CLI requirements.  I tried many variations of these names to keep them as clear as possible but some of these are global names so that's why I used "daffodil".  C really does need a lot of lines of code to implement CLI parsing even with the argp library but many of these lines are just to initialize structures.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {
+    val structs = finalStructs.mkString("\n")
+    val header =
+      s"""#ifndef GENERATED_CODE_H
+         |#define GENERATED_CODE_H
+         |
+         |#include "common_runtime.h" // for InfosetBase
+         |#include <stdint.h>         // for int32_t
+         |
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |extern InfosetBase *rootInfoset();
+         |
+         |// Define some infoset structures
+         |
+         |$structs
+         |#endif // GENERATED_CODE_H
+         |""".stripMargin
+    header
+  }
+
+  def viewCodeFile(rootElementName: String): String = {
+    val prototypes = this.prototypes.mkString("\n")
+    val erds = this.erds.mkString("\n")
+    val finalImplementation = this.finalImplementation.mkString("\n")
+    val code =
+      s"""#include "generated_code.h"
+         |#include <endian.h> // for be32toh, htobe32
+         |#include <errno.h>  // for errno
+         |#include <stddef.h> // for ptrdiff_t
+         |#include <stdio.h>  // for NULL, fread, fwrite, size_t, feof, ferror, FILE
+         |#include <string.h> // for strerror
+         |
+         |// Prototypes needed for compilation
+         |
+         |$prototypes
+         |
+         |// Metadata singletons
+         |
+         |$erds
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |InfosetBase *
+         |rootInfoset()
+         |{
+         |    static $rootElementName    instance;
+         |    InfosetBase *root = &instance._base;
+         |    ${rootElementName}_ERD.initSelf(root);
+         |    return root;
+         |}
+         |
+         |// Methods to initialize, parse, and unparse infoset nodes
+         |
+         |static const char *
+         |eof_or_error_msg(FILE *stream)
+         |{
+         |    if (feof(stream))
+         |    {
+         |        static const char *error_msg = "Got EOF while expecting more input";
+         |        return error_msg;
+         |    }
+         |    else if (ferror(stream))

Review comment:
       I know that strtol sets errno, but I already checked for errno there so I'm not sure if something else than fread/fwrite might set errno before calling this function.  I'll take a closer look.

##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)

Review comment:
       So if I understand correctly, you would like to rename some projects like this:
   
   - rename `daffodil-core` to `daffodil-schema-compiler`
   - leave `daffodil-lib` alone
   - rename `daffodil-runtime1` to `daffodil-backend-scala-parser`
   - rename `daffodil-runtime1-unparser` to `daffodil-backend-scala-unparser`
   - rename `daffodil-runtime2` to `daffodil-backend-generator-c`
   
   I think those names are indeed more obvious to newcomers, although I would move `-scala` to the end to be consistent with `daffodil-backend-generator-c` (I like symmetric names).  Who knows, if the static runtime C code (not the dynamically generated runtime C code) grows large enough, we might refactor these C source files from `daffodil-backend-generator-c`  to `daffodil-backend-parser-c` and `daffodil-backend-unparser-c` too.
   
   I'll wait for more developers to weight in on the proposed renaming before I rename anything, though.  We can do the renaming later too since we'll merge the pull request into its own separate branch to let multiple developers work on it before we merge it into the main branch.

##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       Agreed, I remembered what you said here and raised the same point for wider discussion in an earlier comment.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import os.Pipe
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+  /**
+   * Returns a data processor with all the same state, but the validation mode changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is performed.
+   */
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {

Review comment:
       I agree that we can move the snap compilation logic to Runtime2TDMLDataProcessor so we may not need a Runtime2DataProcessor.  
   
   I haven't seen anyone discuss my own proposal: add an entirely new daffodil generate subcommand.  Here're all the CLI commands everyone's already proposed along with my own proposal:
   
       daffodil runtime2 -s schema.dfdl.xsd -o schema.bin
       daffodil save-parser -s schema.dfdl.xsd outfile
       daffodil generate -s schema.dfdl.xsd outdir/
   
   I think that generate is sufficiently different from save-parser in semantics that it should be its own command.  Right now save-parser takes a single output file and writes to it, but generate would take a directory and write multiple files into it to give people the choice of an executable with a CLI and main method already compiled in or headers and libraries that people can add to their own application.  I also think generate is a more obvious name than runtime2 and allows the possibility of different backends using the same "generate" command to generate C, Java, or VHDL files (we can use a "runtime" tunable or a CLI option when we have multiple generator backends).
   
   I'm also intrigued by the suggestion to allow the Daffodil Scala CLI to ask the different backends for their CLI options.  The generate command can offer different CLI options depending on the backend in that case.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {

Review comment:
       What do people think?  Should we compile the static runtime C code when we build Daffodil and distribute the static lib in the zip/tar/rpm or should we put the C source files in src/main/resources/c/libruntime2 in order to extract them from the jar when it's time to compile all of the C source files for the first time on a user's machine?  I want to keep things as simple as possible until requirements become clearer, but we definitely should come to a consensus on when to compile the static library.

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/GrammarTerm.scala
##########
@@ -51,7 +52,8 @@ import org.apache.daffodil.dsom.Term
 abstract class Gram(contextArg: SchemaComponent)
   extends OOLAGHostImpl(contextArg)
   with BasicComponent
-  with GramRuntime1Mixin {
+  with GramRuntime1Mixin
+  with GramRuntime2Mixin {

Review comment:
       I like Brandon's suggestion to turn generateCode() from a method of Gram (through a mixin trait) into a simple function in daffodil-backend-generator-c that takes Gram as an argument.  However, Gram's generateCode() is not the only method we have added to traits/classes in runtime1.  I'll have to evaluate how easy it would be to move everything to daffodil-backend-generator-c.  I'll start with Gram's generatorCode() at least so we can see what else is left to move.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498430401



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/GrammarTerm.scala
##########
@@ -51,7 +52,8 @@ import org.apache.daffodil.dsom.Term
 abstract class Gram(contextArg: SchemaComponent)
   extends OOLAGHostImpl(contextArg)
   with BasicComponent
-  with GramRuntime1Mixin {
+  with GramRuntime1Mixin
+  with GramRuntime2Mixin {

Review comment:
       Well, by way of discussion..... bunch of points here.
   
   So one issue is that runtime2 isn't independent of runtime1. I believe there are things created when runtime1 creates the parsers and unparsers that are used by runtime2. 
   
   There's also a hypothesis here, that the backends could be pluggable because the schema compiler doesn't have to do backend-specific things. The schema compiler has a backend-independent aspect, which creates and hooks together the grammar objects. But the implementation of the grammar objects is the _essence_ of the back-end. 
   
   A back end is (a) the part of the schema compiler (grammar objects) that create data structures for that back end, and (b) a library of primitives. 
   
   Part (a) here is part of daffodil-core pretty much inherently right now. Grammar primitives use all the data structures of the schema compiler - the DSOM objects are available, and used as context for issuing SDEs from the grammar objects
   
   Part  (b) is the daffodil-runtimeX module. For runtime2 right now this is a tiny thing currently, but would of course grow rapidly as we expand beyond just 32-bit integers. 
   
   Every backend is going to add code to the grammar primitives. Right now those grammar primitives support 3 methods that backends implement. parser/unparser which create scala data structures that are invocable to carry out parsing/unparsing, and now generateCode, which carries out a generation processes, and like parser/unparser can cause SDEs to be issued, but otherwise is sort of the end of the line for the compiler.
   
   If we wanted to separate backends in a pluggable way, from the rest of the schema compiler, I think the grammar objects would not have parser/unparser/code-generator methods on them at all. Instead the grammar data structure would be the output of the schema compiler. The back-end implementation of each grammar primitive's compilation would be found and delegated to, or maybe the grammar data structure would be walked/visited by a  separate back-end specific compiler pass.
   
   The challenge with this refactoring is doing it without having to reimplement an echo of each grammar object class in each back-end.  
   
   Object-oriented Mixins are tightly coupling everything together, but minimize this delegation and walker kind of "glue logic" which makes code so tedious to maintain. 
   
   Given that runtime2 depends on runtime1 currently, this didn't seem worth it as of yet. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-701442102


   I suggest the branch we put this on not be named "wip-" prefix, because it's only a wip until it isn't, also daffodil on the branch name isn't needed. I suggest naming the branch runtime2-2202. The ticket number is still useful because if you lookup the ticket there are links and info there (or should be). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499696824



##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       By "shared" library, I did not intend to suggest ".so" vs. ".a", but just "shared" generally, as in not need to be duplicated. I do understand many users would want ".a" type linkage. 
   
   I think we want people to be able to run TDML tests against the C-generating backend. Leveraging TDML tests for the overlapping portable subset of DFDL that both backends support is critical to success. 
   
   That suggests we need some "on the fly" way of compiling, linking, and running the C code. 
   
   As for packaging for "real" usage like when the user just wants to complile their DFDL schema and take away the C-code to some other system, I'm less concerned about this. Just creating a zip/jar of the include and C-code directories would seem sufficient to me. But we do want it to be 100% clear what part is generated, and so if they recompile the schema, would be discarded and overwritten by new generated code. That needs to be distinguished from the static "library" code (and its headers) which would not be regenerated. 
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r497838711



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns

Review comment:
       This should be the prefix string, or null. It should not be an artifact of conversion to XML wanting to make an xmlns:pre binding out of it via the mxml library. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499579499



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {
+  private var executableFile: os.Path = _
+  private lazy val isWindows = System.getProperty("os.name").toLowerCase().startsWith("windows")
+
+  /**
+   * Compiles and links generated C code with runtime2 library to
+   * build an executable file.
+   */
+  def compile(rootElementName: String, codeGeneratorState: CodeGeneratorState): Unit = {
+    val compiler = "cc"
+    val location = Option(this.getClass.getProtectionDomain.getCodeSource) flatMap (x => Option(x.getLocation))
+    val wd = if (os.exists(os.pwd/"daffodil-runtime2"))
+      os.pwd
+    else if (os.exists(os.pwd/os.up/"daffodil-runtime2"))
+      os.pwd/os.up
+    else if (location.isDefined)
+      os.Path(FileUtils.toFile(location.get))/os.up/os.up
+    else
+      os.pwd
+    val includeDir = if (os.exists(wd/"include"))
+      wd/"include"
+    else
+      wd/"daffodil-runtime2"/"src"/"main"/"c"
+    val libDir = if (os.exists(wd/"lib"))
+      wd/"lib"
+    else
+      wd/"daffodil-runtime2"/"target"/"streams"/"compile"/"ccTargetMap"/"_global"/"streams"/"compile"/"sbtcc.Library"

Review comment:
       Personally I'd prefer the latter. This feels very fragile to me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] bsloane1650 commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
bsloane1650 commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499688509



##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       I'd rather not get in the bussiness of shipping precompiled static libraries.
   
   At the moment, Daffodil is mostly platform independent by virtue of running on the JVM. This goes away if we include compiled c code. Further, providing a shared library for multiple supported systems from a single build would involve setting up cross compilation, which is another headache.
   
   Since our output is c source code anyway, the user already needs to go through a c compile step regardless, so I do not think we are saving them much by pre compiling the common runtime.
   
   Further, a common usecase for runtime2 would likely be embedded systems, where space is at a premium. Users of such systems may prefer to compile everything at once with static linking in order to benifit from the size reductions of dead code elimination (as portions of the runtime may not be used by every format)
   
   If we do want to provide a shared library, I don't think including it in the jar makes sense. You need the .so at parser runtime; which (unlike with jni) does not occur under a jvm. If we provide a .so, I would think it should be distributed seperatly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-701399356


   Ah, that's my fault. Perhaps recreating the branch will reopn the pull request? Feel free to push that branch again, I'm not sure which commit it was on. Though, I'm not sure we really need a new branch unless the plan is to merge it in a sort of experimental state. Is the goal here as a preliminary review which much more work to be done, or do you think it's ready to merge into a branch that has more visibiilty and allow more development, but maybe isn't ready to be merged into master yet?
   
   As for dropping 2.11, that's probably something we should start a thread on the dev list for. I believe the only reason we maintained that was for Apache Spark, which until recently only supported 2.11. But the latest versions of Spark supports 2.12, so it should probably be fine to drop.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r497734489



##########
File path: daffodil-cli/build.sbt
##########
@@ -42,6 +42,18 @@ mappings in Universal ++= Seq(
   baseDirectory.value / "README.md" -> "README.md",
 )
 
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "common_runtime.h" -> "include/common_runtime.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "daffodil_argp.h" -> "include/daffodil_argp.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "stack.h" -> "include/stack.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_reader.h" -> "include/xml_reader.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_writer.h" -> "include/xml_writer.h",
+)
+
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "target" / "streams" / "compile" / "ccTargetMap" / "_global" / "streams" / "compile" / "sbtcc.Library" / "libruntime2.a" -> "lib/libruntime2.a",

Review comment:
       Hmmm. What does this line do? It mentions "target" explicitly, which is very odd to me. 

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {

Review comment:
       Why are these named with "view" prefix? Seems like "gen" or "generate" would be better. 

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *number_from_xml = mxmlGetOpaque(reader->node);
+    const char *name_from_infoset = erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);

Review comment:
       here too, shouldn't this check be before the walk call?

##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)
+                              .enablePlugins(CcPlugin)
+                              .dependsOn(tdmlProc)
+                              .settings(commonSettings)
+                              .settings(publishArtifact in (Compile, packageDoc) := false)
+                              .settings(
+                                Compile / ccTargets := ListSet(runtime2StaticLib),
+                                Compile / cSources  := Map(
+                                  runtime2StaticLib -> Seq(
+                                    baseDirectory.value / "src" / "main" / "c" / "common_runtime.c",

Review comment:
       It seems surprising to me that we have to list every file here that goes into the static lib. We have to maintain this, or perhaps we create a subdir and rules about packaging the C code found in that subdir? Or maybe runtime2StaticSlib is a separate subproject from the rest of runtime2 so that all its files can go in the static library? 
   
   I'm not suggesting we fix this right now. It's just something to think about because maintaining this will be a pain. 

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/SequenceCombinator.scala
##########
@@ -91,6 +92,16 @@ class OrderedSequence(sq: SequenceTermBase, sequenceChildrenArg: Seq[SequenceChi
       }
     }
   }
+
+  override def generateCode(cgState: CodeGeneratorState): Unit = {
+    //
+    // To lift this draconian restriction, we have to
+    // generate code for each of the children, and combine them into a block
+    //
+    sq.schemaDefinitionUnless(sequenceChildren.length == 1, "Only a single child of a sequence is supported.")

Review comment:
       Oh, I thought this was taken out and we had tests that have nested complex types, the leaf simple types of which are the 32-bit ints which are the only thing we support. 

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/primitives/ElementCombinator.scala
##########
@@ -280,8 +300,11 @@ case class CaptureValueLengthEnd(ctxt: ElementBase)
       new CaptureEndOfValueLengthUnparser(ctxt.erd)
     else
       new NadaUnparser(ctxt.erd)
-}
 
+  override def generateCode(state: CodeGeneratorState): Unit = {
+    // Not generating code here

Review comment:
       These are not yet supported. But rather than issuing an SDE here, since they get put into the grammar tree (probably by mistake - missed optimization) we need for now for these to not fail/SDE. 
   
   I think these should SDW saying not supported. When we fix the bug that has these things in the grammar at all, then the warnings would go away. 
   
   I realize the base method being overridden does a generic "not supported" SDE. 

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {
+  private var executableFile: os.Path = _
+  private lazy val isWindows = System.getProperty("os.name").toLowerCase().startsWith("windows")
+
+  /**
+   * Compiles and links generated C code with runtime2 library to
+   * build an executable file.
+   */
+  def compile(rootElementName: String, codeGeneratorState: CodeGeneratorState): Unit = {
+    val compiler = "cc"
+    val location = Option(this.getClass.getProtectionDomain.getCodeSource) flatMap (x => Option(x.getLocation))
+    val wd = if (os.exists(os.pwd/"daffodil-runtime2"))
+      os.pwd
+    else if (os.exists(os.pwd/os.up/"daffodil-runtime2"))
+      os.pwd/os.up
+    else if (location.isDefined)
+      os.Path(FileUtils.toFile(location.get))/os.up/os.up
+    else
+      os.pwd
+    val includeDir = if (os.exists(wd/"include"))
+      wd/"include"
+    else
+      wd/"daffodil-runtime2"/"src"/"main"/"c"
+    val libDir = if (os.exists(wd/"lib"))
+      wd/"lib"
+    else
+      wd/"daffodil-runtime2"/"target"/"streams"/"compile"/"ccTargetMap"/"_global"/"streams"/"compile"/"sbtcc.Library"

Review comment:
       Shouldn't the sbt-cc plugin take care of this sort of thing? Do we need this so we can TDML test? 

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)

Review comment:
       Best not to convert things into null pointers or not. 
   You have Option types here. Should use those. 
   E.g.,
   ```
   val optXMLPrefixDefinition = context.namedQName.prefix.map{ pre => s"xmlns:${pre}"}
   ```
   I think there's 2 possibilities here: no namespace, namespace 
   Then within namespace there's 2 possibilities: with prefix or default namespace. 
   So I think this can be simplified, and have no possible null pointer exceptions.
   

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {
+    val structs = finalStructs.mkString("\n")
+    val header =
+      s"""#ifndef GENERATED_CODE_H
+         |#define GENERATED_CODE_H
+         |
+         |#include "common_runtime.h" // for InfosetBase
+         |#include <stdint.h>         // for int32_t
+         |
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |extern InfosetBase *rootInfoset();
+         |
+         |// Define some infoset structures
+         |
+         |$structs
+         |#endif // GENERATED_CODE_H
+         |""".stripMargin
+    header
+  }
+
+  def viewCodeFile(rootElementName: String): String = {
+    val prototypes = this.prototypes.mkString("\n")
+    val erds = this.erds.mkString("\n")
+    val finalImplementation = this.finalImplementation.mkString("\n")
+    val code =
+      s"""#include "generated_code.h"
+         |#include <endian.h> // for be32toh, htobe32
+         |#include <errno.h>  // for errno
+         |#include <stddef.h> // for ptrdiff_t
+         |#include <stdio.h>  // for NULL, fread, fwrite, size_t, feof, ferror, FILE
+         |#include <string.h> // for strerror
+         |
+         |// Prototypes needed for compilation
+         |
+         |$prototypes
+         |
+         |// Metadata singletons
+         |
+         |$erds
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |InfosetBase *
+         |rootInfoset()
+         |{
+         |    static $rootElementName    instance;
+         |    InfosetBase *root = &instance._base;
+         |    ${rootElementName}_ERD.initSelf(root);
+         |    return root;
+         |}
+         |
+         |// Methods to initialize, parse, and unparse infoset nodes
+         |
+         |static const char *

Review comment:
       This eof_or_error_msg can be static code in libruntime2.a. It has no substitutions going into it. 
   However, it's possible the string should get the current element name appended into it, but if that's the case it would also need the function name to be element specific, and the call to it from the parseSelf and unparseSelf would have to take that into consideration. 
   
   Maybe leave it generic static code (and put in library)  but pass both the FILE* and the ERD to it, and it can get the element name/context info from the ERD?

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns

Review comment:
       So the qname initializer either is just one string, or it is 3 strings. The qname structure always has 3 fields. If the initializer is short, are the other members initialized to null, or just left random? We need them to be null. 
   
   Also, it appears the string "xmlns" in the 2nd member of this struct means "there is a namespace, but no prefix for it". I would use string "default namespace" for that to be more explicit, or just null. 

##########
File path: daffodil-core/src/test/scala/org/apache/daffodil/runtime2/TestGeneratedCodeCompiler.scala
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.ByteArrayInputStream
+
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.infoset.DIComplex
+import org.apache.daffodil.infoset.TestInfoset
+import org.apache.daffodil.util.Misc
+import org.apache.daffodil.util.SchemaUtils
+import org.junit.Test
+
+// Now that we can run TDML tests with runtime2, this test's remaining
+// value is for debugging of runtime2 components.
+class TestGeneratedCodeCompiler {
+
+  @Test
+  def compileRunParseInt32(): Unit = {
+    // Compile a DFDL schema to parse int32 numbers
+    val testSchema = SchemaUtils.dfdlTestSchema(
+        <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>,
+        <dfdl:format representation="binary" ref="GeneralFormat"/>,
+      <xs:element name="c1">
+        <xs:complexType>
+          <xs:sequence>
+            <xs:element name="e1" type="xs:int"/>
+            <xs:element name="c2">
+              <xs:complexType>
+                <xs:sequence>
+                  <xs:element name="e2" type="xs:int"/>

Review comment:
       So here we have a complex type sequence of two int. But above there's a comment where there is code that insists on only one child for a sequence. 
   
   Those are not consistent. If this test works, that code above must be dead code. 
   
   ???

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/api/DFDLParserUnparser.scala
##########
@@ -202,6 +204,14 @@ object DFDL {
     def parse(ab: Array[Byte]): Unit
   }
 
+  trait CodeGeneratorState {

Review comment:
       I don't think we need this. This state object is state of the compiler/generator that emits C code. That's part of daffodil core, not daffodil-runtime2. 

##########
File path: daffodil-core/src/test/scala/org/apache/daffodil/runtime2/TestGeneratedCodeCompiler.scala
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.ByteArrayInputStream
+
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.infoset.DIComplex
+import org.apache.daffodil.infoset.TestInfoset
+import org.apache.daffodil.util.Misc
+import org.apache.daffodil.util.SchemaUtils
+import org.junit.Test
+
+// Now that we can run TDML tests with runtime2, this test's remaining
+// value is for debugging of runtime2 components.
+class TestGeneratedCodeCompiler {
+
+  @Test
+  def compileRunParseInt32(): Unit = {
+    // Compile a DFDL schema to parse int32 numbers
+    val testSchema = SchemaUtils.dfdlTestSchema(
+        <xs:include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>,
+        <dfdl:format representation="binary" ref="GeneralFormat"/>,
+      <xs:element name="c1">
+        <xs:complexType>
+          <xs:sequence>
+            <xs:element name="e1" type="xs:int"/>
+            <xs:element name="c2">
+              <xs:complexType>
+                <xs:sequence>
+                  <xs:element name="e2" type="xs:int"/>
+                  <xs:element name="e3" type="xs:int"/>
+                </xs:sequence>
+              </xs:complexType>
+            </xs:element>
+          </xs:sequence>
+        </xs:complexType>
+      </xs:element>)
+    val schemaCompiler = Compiler()
+    val pf = schemaCompiler.compileNode(testSchema)
+    assert(!pf.isError, pf.getDiagnostics.map(_.getMessage()).mkString("\n"))
+    // Generate C code from the DFDL schema
+    val codeGeneratorState = pf.generateCode()
+    val generatedCodeCompiler = new GeneratedCodeCompiler(pf)
+    val rootElementName = "c1"
+    generatedCodeCompiler.compile(rootElementName, codeGeneratorState)
+    assert(!pf.isError, pf.getDiagnostics.map(_.getMessage()).mkString("\n"))
+    // Run the executable to parse int32 numbers
+    val dp = generatedCodeCompiler.dataProcessor
+    val b = Misc.hex2Bytes("000000010000000200000003")
+    val input = new ByteArrayInputStream(b)
+    val pr = dp.parse(input)
+    assert(!pr.isError && pf.getDiagnostics.isEmpty, pr.getDiagnostics.map(_.getMessage()).mkString("\n"))
+    // Create an internal Daffodil infoset from the XML file
+    val (infoset: DIComplex, _, tunables) = TestInfoset.testInfoset(testSchema, pr.infosetAsXML)
+    assert(infoset.hasVisibleChildren)
+    assert(infoset.erd.name == "c1")

Review comment:
       We should explore the infoset more here, and insure it matches the schema. In principle we could validate the data against the schema rather than do some ad-hoc explore of it. 

##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "common_runtime.h"
+
+// walkInfosetNode - recursively walk an infoset node and call
+// VisitEventHandler methods
+
+static const char *
+walkInfosetNode(const VisitEventHandler *handler, const InfosetBase *infoNode)

Review comment:
       Is there some convention for function comments that enables javadoc-like tools to create HTML doc out of them?

##########
File path: daffodil-runtime2/src/main/c/daffodil_argp.h
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef DAFFODIL_ARGP_H
+#define DAFFODIL_ARGP_H
+
+// Parse our "daffodil" command line interface
+
+extern int parse_daffodil_cli(int argc, char **argv);
+
+// Get our "daffodil" CLI options
+
+extern struct daffodil_cli
+{
+    enum daffodil_subcommand
+    {
+        DAFFODIL_NONE,
+        DAFFODIL_PARSE,
+        DAFFODIL_UNPARSE
+    } subcommand;
+    int verbosity;
+} daffodil_cli;
+
+// Get our "daffodil parse" CLI options
+
+extern struct daffodil_parse_cli
+{
+    const char *infoset_type;

Review comment:
       can we call this the infoset_converter? or external_infoset_rep? 
   
   The C infoset is the structs that extend InfosetBase. We convert that infoset to xml (or eventually other things if we want) for the CLI to make the data tangible for users learning DFDL, and TDML test-runner reasons.
   
   This is again about trying to keep separation of all things that convert the C infoset to/from JSON, XML, etc. being separate from the "pure" runtime which should have no code specific to XML, JSON, etc. 

##########
File path: daffodil-runtime2/src/main/c/generated_code.h
##########
@@ -0,0 +1,44 @@
+/*

Review comment:
       Also belongs in daffodil-runtime2/src/test/c/handCraftedExample1.h

##########
File path: daffodil-runtime2/src/main/c/daffodil_main.c
##########
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "common_runtime.h" // for walkInfoset, InfosetBase, ERD, ...
+#include "daffodil_argp.h"  // for daffodil_cli, parse_daffodil_cli, ...
+#include "generated_code.h" // for rootInfoset

Review comment:
       Do we actually generate this? I think this should be called root.h.
   
   I am assuming the actual generated code defines the rootInfoset() function, so that function has a prototype here, but no implementation. 
   

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)

Review comment:
       Shouldn't this test be before we call mxmlWalkNext(...) ?

##########
File path: daffodil-runtime2/src/main/c/daffodil_argp.c
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "daffodil_argp.h" // for daffodil_cli, daffodil_parse_cli, ...
+#include <argp.h>          // for argp_state, argp_error, error_t, argp_parse
+#include <stdio.h>         // for sprintf
+#include <stdlib.h>        // for putenv, NULL
+#include <string.h>        // for strlen, strcmp
+
+// Initialize our "daffodil" name and version
+
+const char *argp_program_version = "Apache Daffodil (runtime2) 0.1";
+
+// Initialize our "daffodil parse" CLI options
+
+struct daffodil_parse_cli daffodil_parse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option parse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to output. Must be one of 'xml' or 'null'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to a given file. If not given or is -, output is written to "
+     "stdout"},
+
+    {0}};
+
+static error_t parse_handler(int key, char *arg, struct argp_state *state);
+
+static const char parse_args_doc[] = "[infile]";
+
+static const char parse_doc[] =
+    "\n"
+    "Parse a file using a DFDL schema\n"
+    "\n"
+    "Parse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to parse. "
+    "If not specified, or a value of -, reads from stdin";
+
+static const struct argp parse_argp = {
+    parse_options,  // array of CLI options
+    parse_handler,  // function to get these CLI options
+    parse_args_doc, // short usage documentation
+    parse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil parse" CLI options
+
+static error_t
+parse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_parse_cli *parse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        parse->infoset_type = arg;
+        break;
+
+    case 'o':
+        parse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        parse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil parse" command line interface
+
+static error_t
+parse_daffodil_parse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" parse") + 1];
+
+    sprintf(new_cmd, "%s parse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&parse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_parse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil unparse" CLI options
+
+struct daffodil_unparse_cli daffodil_unparse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option unparse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to unparse. Must be 'xml'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to file. If not given or is -, output is written to "
+     "standard output"},
+
+    {0}};
+
+static error_t unparse_handler(int key, char *arg, struct argp_state *state);
+
+static const char unparse_args_doc[] = "[infile]";
+
+static const char unparse_doc[] =
+    "\n"
+    "Unparse an infoset file using a DFDL schema\n"
+    "\n"
+    "Unparse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to unparse. If not specified, or "
+    "a value of -, reads from stdin";
+
+static const struct argp unparse_argp = {
+    unparse_options,  // array of CLI options
+    unparse_handler,  // function to get these CLI options
+    unparse_args_doc, // short usage documentation
+    unparse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil unparse" CLI options
+
+static error_t
+unparse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_unparse_cli *unparse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        unparse->infoset_type = arg;
+        break;
+
+    case 'o':
+        unparse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        unparse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil unparse" command line interface
+
+static error_t
+parse_daffodil_unparse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" unparse") + 1];
+
+    sprintf(new_cmd, "%s unparse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&unparse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_unparse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil" CLI options
+
+struct daffodil_cli daffodil_cli = {
+    DAFFODIL_NONE, // default subcommand
+    0,             // default verbosity
+};
+
+static const struct argp_option daffodil_options[] = {
+    {"verbose", 'v', 0, 0, "Increment verbosity level, one level for each -v",
+     -1},
+
+    {0}};
+
+static error_t daffodil_handler(int key, char *arg, struct argp_state *state);
+
+static const char daffodil_args_doc[] = "<subcommand> [SUBCOMMAND_OPTION...]";
+
+static const char daffodil_doc[] =
+    "\n"
+    "Global Options:"
+    "\v"
+    "Subcommands:\n"
+    "  parse         Parse data to a DFDL infoset\n"
+    "  unparse       Unparse a DFDL infoset\n"
+    "\n"
+    "Run 'daffodil <subcommand> --help' for subcommand specific options";
+
+static const struct argp daffodil_argp = {
+    daffodil_options,  // array of CLI options
+    daffodil_handler,  // function to get these CLI options
+    daffodil_args_doc, // short usage documentation
+    daffodil_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil" CLI options
+
+static error_t
+daffodil_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_cli *daffodil = state->input;
+    error_t              status = 0;
+
+    switch (key)
+    {
+    case 'v':
+        daffodil->verbosity++;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (strcmp(arg, "parse") == 0)
+        {
+            daffodil->subcommand = DAFFODIL_PARSE;
+            status = parse_daffodil_parse_cli(state);

Review comment:
       Ok, way too many overloadings of "parse" here. Let's not use the term "parse" for anything but DFDL-driven parsing. If we're parsing the arguments to the CLI, can we call that cliArgParse ?
   
   Is there a getOpt type of library for C that we can use to shorten this code? Seems like a lot of code to just implement a command line args system.  
   
   Perhaps this is just that I am not used to how verbose C is and how many lines of code it takes to implement things. 
   
   I think putting "daffodil" in these function names just makes them longer, and doesn't add clarity. Or.... is that some sort of convention that "daffodil" functions are considered the "public" ones of this code file, and the others are internal/private?
   

##########
File path: daffodil-runtime2/src/main/c/common_runtime.h
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef COMMON_RUNTIME_H
+#define COMMON_RUNTIME_H
+
+#include <stddef.h> // for ptrdiff_t
+#include <stdint.h> // for int32_t
+#include <stdio.h>  // for FILE, size_t
+
+// Prototypes needed for compilation
+
+typedef struct ElementRuntimeData ERD;
+typedef struct InfosetBase        InfosetBase;
+typedef struct PState             PState;
+typedef struct UState             UState;
+typedef struct VisitEventHandler  VisitEventHandler;
+
+typedef void (*ERDInitSelf)(InfosetBase *infoNode);
+typedef const char *(*ERDParseSelf)(InfosetBase * infoNode,
+                                    const PState *pstate);
+typedef const char *(*ERDUnparseSelf)(const InfosetBase *infoNode,
+                                      const UState *     ustate);
+
+typedef const char *(*VisitStartDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitEndDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitStartComplex)(const VisitEventHandler *handler,
+                                         const InfosetBase *      base);
+typedef const char *(*VisitEndComplex)(const VisitEventHandler *handler,
+                                       const InfosetBase *      base);
+typedef const char *(*VisitInt32Elem)(const VisitEventHandler *handler,
+                                      const ERD *erd, const int32_t *location);
+
+// NamedQName - name of an infoset element
+
+typedef struct NamedQName
+{
+    char *name;  // element name (including prefix if any)
+    char *xmlns; // xmlns attribute name (including prefix if any)
+    char *ns;    // xmlns attribute value (a namespace URI)
+} NamedQName;
+
+// TypeCode - type of an infoset element
+
+enum TypeCode
+{
+    COMPLEX,
+    PRIMITIVE_INT32
+};
+
+// ERD - element runtime data needed to parse/unparse objects
+
+typedef struct ElementRuntimeData
+{
+    const NamedQName    namedQName;
+    const enum TypeCode typeCode;
+    const size_t        numChildren;
+    const ptrdiff_t *   offsets;
+    const ERD **        childrenERDs;
+
+    const ERDInitSelf    initSelf;
+    const ERDParseSelf   parseSelf;
+    const ERDUnparseSelf unparseSelf;

Review comment:
       Getting all this const stuff right is awesome! So these members can only be initialized, not assigned. Great. 

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *number_from_xml = mxmlGetOpaque(reader->node);

Review comment:
       so mxml library doesn't have  a mxmlGetInt that does the strtol itself? Based on the way you've written this, it seems that every simple type must be read as mxmlGetOpaque then converted to a value type. I would expect if they have a "Opaque" call, that they'd have complementary type-specific functions also. But maybe not.
   

##########
File path: daffodil-runtime2/src/main/c/common_runtime.h
##########
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef COMMON_RUNTIME_H
+#define COMMON_RUNTIME_H
+
+#include <stddef.h> // for ptrdiff_t
+#include <stdint.h> // for int32_t
+#include <stdio.h>  // for FILE, size_t
+
+// Prototypes needed for compilation
+
+typedef struct ElementRuntimeData ERD;
+typedef struct InfosetBase        InfosetBase;
+typedef struct PState             PState;
+typedef struct UState             UState;
+typedef struct VisitEventHandler  VisitEventHandler;
+
+typedef void (*ERDInitSelf)(InfosetBase *infoNode);
+typedef const char *(*ERDParseSelf)(InfosetBase * infoNode,
+                                    const PState *pstate);
+typedef const char *(*ERDUnparseSelf)(const InfosetBase *infoNode,
+                                      const UState *     ustate);
+
+typedef const char *(*VisitStartDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitEndDocument)(const VisitEventHandler *handler);
+typedef const char *(*VisitStartComplex)(const VisitEventHandler *handler,
+                                         const InfosetBase *      base);
+typedef const char *(*VisitEndComplex)(const VisitEventHandler *handler,
+                                       const InfosetBase *      base);
+typedef const char *(*VisitInt32Elem)(const VisitEventHandler *handler,
+                                      const ERD *erd, const int32_t *location);
+
+// NamedQName - name of an infoset element
+
+typedef struct NamedQName
+{
+    char *name;  // element name (including prefix if any)
+    char *xmlns; // xmlns attribute name (including prefix if any)

Review comment:
       What do we need this for? Feels very TDML runner and XML-conversion specific. You are equiping the NamedQName structure with something for synthesizing xmlns:prefix="namespaceURI" type declarations. That's only needed if you are synthesizing XML. From purely DFDL perspective, we need prefix string or null here so diagnostics can refer to the element name using the DFDL schema's terminology. 
   
   It's hard to keep the things only needed for XML/TDML separate, but this is a small thing that is an artifact of that I think. 

##########
File path: daffodil-runtime2/src/main/c/xml_writer.c
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_writer.h"
+#include "stack.h"  // for stack_is_empty, stack_pop, stack_push, stack_top
+#include <assert.h> // for assert
+#include <mxml.h>   // for mxml_node_t, mxmlNewElement, mxmlNewOpaquef, ...
+#include <stdint.h> // for int32_t
+#include <stdio.h>  // for NULL, fflush
+
+// Push new XML document on stack.  This function is not
+// thread-safe since it uses static storage.
+
+static const char *
+xmlStartDocument(XMLWriter *writer)
+{
+#define MAX_DEPTH 100
+    static mxml_node_t *array[MAX_DEPTH];
+    stack_init(&writer->stack, array, MAX_DEPTH);
+
+    mxml_node_t *xml = mxmlNewXML("1.0");
+    stack_push(&writer->stack, xml);
+    return xml != NULL ? NULL : "Error making new XML declaration";
+}
+
+// Pop completed XML document off stack and write it to stream
+
+static const char *
+xmlEndDocument(XMLWriter *writer)
+{
+    mxml_node_t *xml = stack_pop(&writer->stack);
+    assert(stack_is_empty(&writer->stack));
+    int status = mxmlSaveFile(xml, writer->stream, MXML_NO_CALLBACK);
+    if (status < 0)
+    {
+        return "Error writing XML document";
+    }
+    status = fflush(writer->stream);
+    mxmlDelete(xml);
+    return status == 0 ? NULL : "Error flushing stream";
+}
+
+// Push new complex element on stack
+
+static const char *
+xmlStartComplex(XMLWriter *writer, const InfosetBase *base)
+{
+    mxml_node_t *complex = NULL;
+    if (!stack_is_full(&writer->stack))
+    {
+        mxml_node_t *parent = stack_top(&writer->stack);
+        char *       name = base->erd->namedQName.name;
+        char *       xmlns = base->erd->namedQName.xmlns;

Review comment:
       So this is the string that will be "xmlns" if the namespace is the default namespace. Yeah, it would be better if the data structures of the InfoBase objects didn't have artifacts of mxml's api in them. 

##########
File path: daffodil-runtime2/src/main/c/generated_code.c
##########
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "generated_code.h"

Review comment:
       So why is this file in the code base? It's an example of what we want generated code to look like right?
   I think this belongs in daffodil-runtime2/src/test/c/handCraftedExample1.c So that it is clear it is test-related code, not part of the runtime2 library, nor something that is generated. 

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {
+    val structs = finalStructs.mkString("\n")
+    val header =
+      s"""#ifndef GENERATED_CODE_H
+         |#define GENERATED_CODE_H
+         |
+         |#include "common_runtime.h" // for InfosetBase
+         |#include <stdint.h>         // for int32_t
+         |
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |extern InfosetBase *rootInfoset();
+         |
+         |// Define some infoset structures
+         |
+         |$structs
+         |#endif // GENERATED_CODE_H
+         |""".stripMargin
+    header
+  }
+
+  def viewCodeFile(rootElementName: String): String = {
+    val prototypes = this.prototypes.mkString("\n")
+    val erds = this.erds.mkString("\n")
+    val finalImplementation = this.finalImplementation.mkString("\n")
+    val code =
+      s"""#include "generated_code.h"
+         |#include <endian.h> // for be32toh, htobe32
+         |#include <errno.h>  // for errno

Review comment:
       I hope we can get rid of errno. Comment below about switching to strtonum() from strtol() so as to eliminate use of errno.

##########
File path: daffodil-runtime2/src/main/c/xml_writer.c
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_writer.h"
+#include "stack.h"  // for stack_is_empty, stack_pop, stack_push, stack_top
+#include <assert.h> // for assert
+#include <mxml.h>   // for mxml_node_t, mxmlNewElement, mxmlNewOpaquef, ...
+#include <stdint.h> // for int32_t
+#include <stdio.h>  // for NULL, fflush
+
+// Push new XML document on stack.  This function is not
+// thread-safe since it uses static storage.
+
+static const char *
+xmlStartDocument(XMLWriter *writer)
+{
+#define MAX_DEPTH 100
+    static mxml_node_t *array[MAX_DEPTH];
+    stack_init(&writer->stack, array, MAX_DEPTH);
+
+    mxml_node_t *xml = mxmlNewXML("1.0");
+    stack_push(&writer->stack, xml);
+    return xml != NULL ? NULL : "Error making new XML declaration";
+}
+
+// Pop completed XML document off stack and write it to stream
+
+static const char *
+xmlEndDocument(XMLWriter *writer)
+{
+    mxml_node_t *xml = stack_pop(&writer->stack);
+    assert(stack_is_empty(&writer->stack));
+    int status = mxmlSaveFile(xml, writer->stream, MXML_NO_CALLBACK);
+    if (status < 0)
+    {
+        return "Error writing XML document";
+    }
+    status = fflush(writer->stream);
+    mxmlDelete(xml);
+    return status == 0 ? NULL : "Error flushing stream";
+}
+
+// Push new complex element on stack
+
+static const char *
+xmlStartComplex(XMLWriter *writer, const InfosetBase *base)
+{
+    mxml_node_t *complex = NULL;
+    if (!stack_is_full(&writer->stack))
+    {
+        mxml_node_t *parent = stack_top(&writer->stack);
+        char *       name = base->erd->namedQName.name;
+        char *       xmlns = base->erd->namedQName.xmlns;
+        char *       ns = base->erd->namedQName.ns;
+        complex = mxmlNewElement(parent, name);
+        if (xmlns != NULL)
+        {
+            mxmlElementSetAttr(complex, xmlns, ns);

Review comment:
       mxml actually considers xmlns prefix bindings to be XML attributes. I find that surprising. They're really not attributes per se. 

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element

Review comment:
       Hmmm. NOt sure you want to do this. This makes these things equivalent:
   ```
   <foo>43</foo>
   <foo>
      43
   </foo>
   ```
   From straight XML comparison perspective those do not have the same XML Infoset. They're equivalent only if ignoring whitespace which is an application decision. The slippery slope of XML equivalency says these are also equivalent 
   ```
   <foo>abc def</foo>
   <foo>abc            def</foo>
   ```
   because adjacent spaces are collapsed. This is all XML "markup language" stuff. When using XML as a data language we have to be more careful about whitespace.
   
   Do we have to handle elements with CDATA bracketing specifically with this mxml library?
   E.g., 
   ```
   <foo><![CDATA[43]]></foo>
   ```
   Or does that get handled by the library? What about entities like "&#x20;" Does mxml convert those, or hand them through as entities for the application to deal with?
   
    

##########
File path: daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/runtime2/Runtime2TDMLDFDLProcessor.scala
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.tdml.processor.runtime2
+
+import org.apache.daffodil.api._
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.runtime2.GeneratedCodeCompiler
+import org.apache.daffodil.runtime2.ParseResult
+import org.apache.daffodil.runtime2.UnparseResult
+import org.apache.daffodil.tdml.processor._
+import org.apache.daffodil.xml.XMLUtils
+
+import scala.xml.Node
+
+final class TDMLDFDLProcessorFactory private(
+  private var compiler: Compiler,
+  private var checkAllTopLevel: Boolean,
+  validateDFDLSchemasArg: Boolean)
+  extends AbstractTDMLDFDLProcessorFactory {
+
+  override def validateDFDLSchemas = validateDFDLSchemasArg
+
+  override type R = TDMLDFDLProcessorFactory
+
+  override def implementationName = "daffodil-runtime2"
+
+  def this() = this(compiler = Compiler(validateDFDLSchemas = true),
+    checkAllTopLevel = false,
+    validateDFDLSchemasArg = true)
+
+  private def copy(
+    compiler: Compiler = compiler,
+    checkAllTopLevel: Boolean = checkAllTopLevel,
+    validateDFDLSchemas: Boolean = validateDFDLSchemas) =
+    new TDMLDFDLProcessorFactory(compiler, checkAllTopLevel, validateDFDLSchemas)
+
+  /**
+   * Deprecated methods must be implemented. Some are just stubs though now.
+   */
+  @deprecated("Use withValidateDFDLSchemas.", "2.6.0")
+  override def setValidateDFDLSchemas(bool: Boolean): Unit = {
+    compiler = compiler.withValidateDFDLSchemas(bool)
+  }
+
+  override def withValidateDFDLSchemas(bool: Boolean): TDMLDFDLProcessorFactory = {
+    copy(compiler = compiler.withValidateDFDLSchemas(bool))
+  }
+
+  @deprecated("Use withCheckAllTopLevel.", "2.6.0")
+  override def setCheckAllTopLevel(checkAllTopLevel: Boolean): Unit = {
+    compiler = compiler.withCheckAllTopLevel(checkAllTopLevel)
+  }
+
+  override def withCheckAllTopLevel(checkAllTopLevel: Boolean): TDMLDFDLProcessorFactory = {
+    copy(compiler = compiler.withCheckAllTopLevel(checkAllTopLevel))
+  }
+
+  @deprecated("Use withTunables.", "2.6.0")
+  override def setTunables(tunables: Map[String, String]): Unit =
+    compiler = compiler.withTunables(tunables)
+
+  override def withTunables(tunables: Map[String, String]): TDMLDFDLProcessorFactory =
+    copy(compiler = compiler.withTunables(tunables))
+
+  @deprecated("Use DaffodilTDMLDFDLProcessor.setExternalDFDLVariables.", "2.6.0")
+  override def setExternalDFDLVariables(externalVarBindings: Seq[Binding]): Unit =
+    compiler = compiler.withExternalDFDLVariablesImpl(externalVarBindings)
+
+  override def withExternalDFDLVariables(externalVarBindings: Seq[Binding]): TDMLDFDLProcessorFactory =
+    copy(compiler = compiler.withExternalDFDLVariablesImpl(externalVarBindings))
+
+  @deprecated("Use arguments to getProcessor()", "2.6.0")
+  override def setDistinguishedRootNode(name: String, namespace: String): Unit =
+    compiler = compiler.withDistinguishedRootNode(name, namespace)
+
+  // We're doing to replace this method with different code.
+  // Return result is a TDML.CompileResult - so it's the result
+  // of compiling the schema for the test.
+  override def getProcessor(
+    schemaSource: DaffodilSchemaSource,
+    useSerializedProcessor: Boolean,
+    optRootName: Option[String] = None,
+    optRootNamespace: Option[String] = None): TDML.CompileResult = {
+    val pf = compiler.compileSource(schemaSource, optRootName, optRootNamespace)
+    val res = if (pf.isError) {
+      Left(pf.getDiagnostics) // DFDL schema compilation diagnostics
+    } else {
+      // How can we move some of these calls to ProcessorFactory with tunable runtime = "runtime2"?
+      val rootElementName = optRootName.getOrElse("FIXME")

Review comment:
       Add a ccomment // FIXME to this line, so that tools that look for those FIXME comments find this. 

##########
File path: daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala
##########
@@ -1307,7 +1312,7 @@ case class UnparserTestCase(ptc: NodeSeq, parentArg: DFDLTestSuite)
     if (testDataLength >= 0) {
       val fullBytesNeeded = (testDataLength + 7) / 8
       if (testData.length != fullBytesNeeded) {
-        throw TDMLException("Unparse result data was was %d bytes, but the result length (%d bits) requires %d bytes.".format(
+        throw TDMLException("Unparse result data was %d bytes, but the result length (%d bits) requires %d bytes.".format(

Review comment:
       This is a bug fix in daffodil!

##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *number_from_xml = mxmlGetOpaque(reader->node);
+    const char *name_from_infoset = erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        if (strcmp(name_from_xml, name_from_infoset) == 0)
+        {
+            // Check for errors when reading the 32-bit integer
+            char *      endptr = NULL;
+            errno = 0; // To distinguish success/failure after call

Review comment:
       Avoid errno. Pretends to be thread safe, but really isn't due to interrupt contexts making calls that use it,  and requires an extra call to get the thread-specific errno location. Slow, and causes subtle errors.
   
   Use strtonum() instead. It doesn't use errno and is a way better choice for that reason. Also does the bounds checking as part of it. It's just generally better.
   
   I realize this is the XML converter, so isn't likely to run into any errno issues. Still, strtonum() is just better anyway as it is cleaner API. 
   
   I squirm any time I see errno. 
   

##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/generators/ParserGenerator.scala
##########
@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2.generators
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.dpath.NodeInfo
+import org.apache.daffodil.dpath.NodeInfo.PrimType
+import org.apache.daffodil.dsom.ElementBase
+import org.apache.daffodil.exceptions.ThrowsSDE
+
+import scala.collection.mutable
+
+/**
+ * Gives an object the ability to generate code.
+ */
+trait ParserGenerator {
+  def generateCode(state: CodeGeneratorState): Unit
+}
+
+/**
+ * Builds up the state of generated code.
+ */
+class CodeGeneratorState extends DFDL.CodeGeneratorState {
+  private val structs = mutable.Stack[ComplexCGState]()
+  private val prototypes = mutable.ArrayBuffer[String]()
+  private val erds = mutable.ArrayBuffer[String]()
+  private val finalStructs = mutable.ArrayBuffer[String]()
+  private val finalImplementation = mutable.ArrayBuffer[String]()
+
+  def addImplementation(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val initStatements = structs.top.initStatements.mkString("\n")
+    val parserStatements = structs.top.parserStatements.mkString("\n")
+    val unparserStatements = structs.top.unparserStatements.mkString("\n")
+    val prototypeFunctions =
+      s"""static void        ${C}_initSelf($C *instance);
+         |static const char *${C}_parseSelf($C *instance, const PState *pstate);
+         |static const char *${C}_unparseSelf(const $C *instance, const UState *ustate);""".stripMargin
+    prototypes += prototypeFunctions
+    val functions =
+      s"""static void
+         |${C}_initSelf($C *instance)
+         |{
+         |$initStatements
+         |}
+         |
+         |static const char *
+         |${C}_parseSelf($C *instance, const PState *pstate)
+         |{
+         |    const char *error_msg = NULL;
+         |$parserStatements
+         |    return error_msg;
+         |}
+         |
+         |static const char *
+         |${C}_unparseSelf(const $C *instance, const UState *ustate)
+         |{
+         |    const char *error_msg = NULL;
+         |$unparserStatements
+         |    return error_msg;
+         |}
+         |""".stripMargin
+    finalImplementation += functions
+  }
+
+  private def defineQNameInit(context: ElementBase): String = {
+    val qname = context.namedQName.toQNameString
+    val xmlns = if (context.namedQName.prefix.isDefined) s"xmlns:${context.namedQName.prefix.get}" else "xmlns"
+    val ns = context.namedQName.namespace.toStringOrNullIfNoNS
+    // Optimize away xmlns=ns declaration if possible, although this approach may not be entirely correct
+    val parentOpt = context.enclosingElements.headOption
+    val parentNs = if (parentOpt.isDefined) parentOpt.get.namedQName.namespace.toStringOrNullIfNoNS
+    val qnameInit = if (ns == null || ns == parentNs)
+      s"""    {"$qname"},       // namedQName.name"""
+    else
+      s"""    {
+         |        "$qname",              // namedQName.name
+         |        "$xmlns",           // namedQName.xmlns
+         |        "$ns", // namedQName.ns
+         |    },""".stripMargin
+    qnameInit
+  }
+
+  def addComplexTypeERD(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val count = structs.top.declarations.length
+    val offsetComputations = structs.top.offsetComputations.mkString(",\n")
+    val erdComputations = structs.top.erdComputations.mkString(",\n")
+    val qnameInit = defineQNameInit(context)
+    val complexERD =
+      s"""static const $C ${C}_compute_ERD_offsets;
+         |
+         |static const ptrdiff_t ${C}_offsets[$count] = {
+         |$offsetComputations
+         |};
+         |
+         |static const ERD *${C}_childrenERDs[$count] = {
+         |$erdComputations
+         |};
+         |
+         |static const ERD ${C}_ERD = {
+         |$qnameInit
+         |    COMPLEX,                         // typeCode
+         |    $count,                               // numChildren
+         |    ${C}_offsets,                      // offsets
+         |    ${C}_childrenERDs,                 // childrenERDs
+         |    (ERDInitSelf)&${C}_initSelf,       // initSelf
+         |    (ERDParseSelf)&${C}_parseSelf,     // parseSelf
+         |    (ERDUnparseSelf)&${C}_unparseSelf, // unparseSelf
+         |};
+         |""".stripMargin
+    erds += complexERD
+  }
+
+  def addStruct(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    val declarations = structs.top.declarations.mkString("\n")
+    val struct =
+      s"""typedef struct $C
+         |{
+         |    InfosetBase _base;
+         |$declarations
+         |} $C;
+         |""".stripMargin
+    finalStructs += struct
+    val initStatement = s"    instance->_base.erd = &${C}_ERD;"
+    structs.top.initStatements += initStatement
+  }
+
+  def addSimpleTypeStatements(initStatement: String, parseStatement: String, unparseStatement: String): Unit = {
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def addComplexTypeStatements(child: ElementBase): Unit = {
+    val C = child.namedQName.local
+    val e = child.name
+    val initStatement = s"    ${C}_initSelf(&instance->$e);"
+    val parseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_parseSelf(&instance->$e, pstate);
+         |    }""".stripMargin
+    val unparseStatement =
+      s"""    if (error_msg == NULL)
+         |    {
+         |        error_msg = ${C}_unparseSelf(&instance->$e, ustate);
+         |    }""".stripMargin
+    structs.top.initStatements += initStatement
+    structs.top.parserStatements += parseStatement
+    structs.top.unparserStatements += unparseStatement
+  }
+
+  def pushComplexElement(context: ElementBase): Unit = {
+    val C = context.namedQName.local
+    structs.push(new ComplexCGState(C))
+  }
+
+  def popComplexElement(context: ElementBase): Unit = {
+    structs.pop()
+  }
+
+  def addSimpleTypeERD(context: ElementBase): Unit = {
+    val e = context.namedQName.local
+    val qnameInit = defineQNameInit(context)
+    val typeCode = context.optPrimType.get match {
+      case PrimType.Int => "PRIMITIVE_INT32"
+      case PrimType.String => "PRIMITIVE_STRING"
+      case p: PrimType => context.SDE("PrimType %s not supported yet.", p.toString)
+    }
+    val erd =
+      s"""static const ERD ${e}_ERD = {
+         |$qnameInit
+         |    $typeCode, // typeCode
+         |    0,               // numChildren
+         |    NULL,            // offsets
+         |    NULL,            // childrenERDs
+         |    NULL,            // initSelf
+         |    NULL,            // parseSelf
+         |    NULL,            // unparseSelf
+         |};
+         |""".stripMargin
+    erds += erd
+    addComputations(context)
+  }
+
+  def addComputations(child: ElementBase): Unit = {
+    val C = structs.top.C
+    val e = child.namedQName.local
+    val offsetComputation = s"    (char *)&${C}_compute_ERD_offsets.$e - (char *)&${C}_compute_ERD_offsets"
+    val erdComputation = s"    &${e}_ERD"
+    structs.top.offsetComputations += offsetComputation
+    structs.top.erdComputations += erdComputation
+  }
+
+  def addFieldDeclaration(context: ThrowsSDE, child: ElementBase): Unit = {
+    val definition = if (child.isSimpleType) {
+      import NodeInfo.PrimType
+      child.optPrimType.get match {
+        case PrimType.Long => "int64_t    "
+        case PrimType.Int => "int32_t    "
+        case x => context.SDE("Unsupported primitive type: " + x)
+      }
+    } else {
+      child.namedQName.local + "         "
+    }
+    structs.top.declarations += s"    $definition ${child.name};"
+  }
+
+  def viewCodeHeader: String = {
+    val structs = finalStructs.mkString("\n")
+    val header =
+      s"""#ifndef GENERATED_CODE_H
+         |#define GENERATED_CODE_H
+         |
+         |#include "common_runtime.h" // for InfosetBase
+         |#include <stdint.h>         // for int32_t
+         |
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |extern InfosetBase *rootInfoset();
+         |
+         |// Define some infoset structures
+         |
+         |$structs
+         |#endif // GENERATED_CODE_H
+         |""".stripMargin
+    header
+  }
+
+  def viewCodeFile(rootElementName: String): String = {
+    val prototypes = this.prototypes.mkString("\n")
+    val erds = this.erds.mkString("\n")
+    val finalImplementation = this.finalImplementation.mkString("\n")
+    val code =
+      s"""#include "generated_code.h"
+         |#include <endian.h> // for be32toh, htobe32
+         |#include <errno.h>  // for errno
+         |#include <stddef.h> // for ptrdiff_t
+         |#include <stdio.h>  // for NULL, fread, fwrite, size_t, feof, ferror, FILE
+         |#include <string.h> // for strerror
+         |
+         |// Prototypes needed for compilation
+         |
+         |$prototypes
+         |
+         |// Metadata singletons
+         |
+         |$erds
+         |// Return the root of an infoset to be used for parsing or unparsing
+         |
+         |InfosetBase *
+         |rootInfoset()
+         |{
+         |    static $rootElementName    instance;
+         |    InfosetBase *root = &instance._base;
+         |    ${rootElementName}_ERD.initSelf(root);
+         |    return root;
+         |}
+         |
+         |// Methods to initialize, parse, and unparse infoset nodes
+         |
+         |static const char *
+         |eof_or_error_msg(FILE *stream)
+         |{
+         |    if (feof(stream))
+         |    {
+         |        static const char *error_msg = "Got EOF while expecting more input";
+         |        return error_msg;
+         |    }
+         |    else if (ferror(stream))

Review comment:
       ferror doesn't set errno. So the return strerror(errno) below ... is depending on something outside of this function having set errno appropriately. But fread/fwrite don't set errno.  Suggest we pass an arg to this specifying whether the error occurred in fread or fwrite, then we issue an error like "Error during data read for %s" where %s comes from the ERD (which we should also pass.)
   

##########
File path: daffodil-runtime2/src/main/c/generated_code.c
##########
@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "generated_code.h"
+#include <endian.h> // for be32toh, htobe32
+#include <errno.h>  // for errno
+#include <stddef.h> // for ptrdiff_t
+#include <stdio.h>  // for NULL, fread, fwrite, size_t, feof, ferror, FILE
+#include <string.h> // for strerror
+
+// Prototypes needed for compilation
+
+static void        c2_initSelf(c2 *instance);
+static const char *c2_parseSelf(c2 *instance, const PState *pstate);
+static const char *c2_unparseSelf(const c2 *instance, const UState *ustate);
+static void        c1_initSelf(c1 *instance);
+static const char *c1_parseSelf(c1 *instance, const PState *pstate);
+static const char *c1_unparseSelf(const c1 *instance, const UState *ustate);
+
+// Metadata singletons
+
+static const ERD e1_ERD = {
+    {"ex:e1"},       // namedQName.name
+    PRIMITIVE_INT32, // typeCode
+    0,               // numChildren
+    NULL,            // offsets
+    NULL,            // childrenERDs
+    NULL,            // initSelf
+    NULL,            // parseSelf
+    NULL,            // unparseSelf
+};
+
+static const ERD e2_ERD = {
+    {"ex:e2"},       // namedQName.name
+    PRIMITIVE_INT32, // typeCode
+    0,               // numChildren
+    NULL,            // offsets
+    NULL,            // childrenERDs
+    NULL,            // initSelf
+    NULL,            // parseSelf
+    NULL,            // unparseSelf
+};
+
+static const ERD e3_ERD = {
+    {"ex:e3"},       // namedQName.name
+    PRIMITIVE_INT32, // typeCode
+    0,               // numChildren
+    NULL,            // offsets
+    NULL,            // childrenERDs
+    NULL,            // initSelf
+    NULL,            // parseSelf
+    NULL,            // unparseSelf
+};
+
+static const c2 c2_compute_ERD_offsets;
+
+static const ptrdiff_t c2_offsets[2] = {
+    (char *)&c2_compute_ERD_offsets.e2 - (char *)&c2_compute_ERD_offsets,
+    (char *)&c2_compute_ERD_offsets.e3 - (char *)&c2_compute_ERD_offsets};
+
+static const ERD *c2_childrenERDs[2] = {&e2_ERD, &e3_ERD};
+
+static const ERD c2_ERD = {
+    {"ex:c2"},                       // namedQName.name
+    COMPLEX,                         // typeCode
+    2,                               // numChildren
+    c2_offsets,                      // offsets
+    c2_childrenERDs,                 // childrenERDs
+    (ERDInitSelf)&c2_initSelf,       // initSelf
+    (ERDParseSelf)&c2_parseSelf,     // parseSelf
+    (ERDUnparseSelf)&c2_unparseSelf, // unparseSelf
+};
+
+static const c1 c1_compute_ERD_offsets;
+
+static const ptrdiff_t c1_offsets[2] = {
+    (char *)&c1_compute_ERD_offsets.e1 - (char *)&c1_compute_ERD_offsets,
+    (char *)&c1_compute_ERD_offsets.c2 - (char *)&c1_compute_ERD_offsets};
+
+static const ERD *c1_childrenERDs[2] = {&e1_ERD, &c2_ERD};
+
+static const ERD c1_ERD = {
+    {
+        "ex:c1",              // namedQName.name
+        "xmlns:ex",           // namedQName.xmlns
+        "http://example.com", // namedQName.ns
+    },
+    COMPLEX,                         // typeCode
+    2,                               // numChildren
+    c1_offsets,                      // offsets
+    c1_childrenERDs,                 // childrenERDs
+    (ERDInitSelf)&c1_initSelf,       // initSelf
+    (ERDParseSelf)&c1_parseSelf,     // parseSelf
+    (ERDUnparseSelf)&c1_unparseSelf, // unparseSelf
+};
+
+// Return the root of an infoset to be used for parsing or unparsing
+
+InfosetBase *
+rootInfoset()
+{
+    static c1    instance;
+    InfosetBase *root = &instance._base;
+    c1_ERD.initSelf(root);
+    return root;
+}
+
+// Methods to initialize, parse, and unparse infoset nodes
+
+static const char *
+eof_or_error_msg(FILE *stream)
+{
+    if (feof(stream))
+    {
+        static const char *error_msg = "Got EOF while expecting more input";
+        return error_msg;
+    }
+    else if (ferror(stream))
+    {
+        return strerror(errno);

Review comment:
       Note: you are depending here on fread/fwrite setting errno. They don't from the doc I quickly googled. 
   
   I find it surprising, but from my googling there is no 100% portable way to find out about the error. Only that there was one. (At least according to search I did, and stack overflow articles.)

##########
File path: daffodil-runtime2/src/main/c/daffodil_argp.c
##########
@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "daffodil_argp.h" // for daffodil_cli, daffodil_parse_cli, ...
+#include <argp.h>          // for argp_state, argp_error, error_t, argp_parse
+#include <stdio.h>         // for sprintf
+#include <stdlib.h>        // for putenv, NULL
+#include <string.h>        // for strlen, strcmp
+
+// Initialize our "daffodil" name and version
+
+const char *argp_program_version = "Apache Daffodil (runtime2) 0.1";
+
+// Initialize our "daffodil parse" CLI options
+
+struct daffodil_parse_cli daffodil_parse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option parse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to output. Must be one of 'xml' or 'null'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to a given file. If not given or is -, output is written to "
+     "stdout"},
+
+    {0}};
+
+static error_t parse_handler(int key, char *arg, struct argp_state *state);
+
+static const char parse_args_doc[] = "[infile]";
+
+static const char parse_doc[] =
+    "\n"
+    "Parse a file using a DFDL schema\n"
+    "\n"
+    "Parse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to parse. "
+    "If not specified, or a value of -, reads from stdin";
+
+static const struct argp parse_argp = {
+    parse_options,  // array of CLI options
+    parse_handler,  // function to get these CLI options
+    parse_args_doc, // short usage documentation
+    parse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil parse" CLI options
+
+static error_t
+parse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_parse_cli *parse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        parse->infoset_type = arg;
+        break;
+
+    case 'o':
+        parse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        parse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil parse" command line interface
+
+static error_t
+parse_daffodil_parse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" parse") + 1];
+
+    sprintf(new_cmd, "%s parse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&parse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_parse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil unparse" CLI options
+
+struct daffodil_unparse_cli daffodil_unparse = {
+    "xml", // default infoset type
+    "-",   // default infile
+    "-",   // default outfile
+};
+
+static const struct argp_option unparse_options[] = {
+    {"infoset-type", 'I', "<infoset_type>", 0,
+     "Infoset type to unparse. Must be 'xml'"},
+
+    {"output", 'o', "<file>", 0,
+     "Write output to file. If not given or is -, output is written to "
+     "standard output"},
+
+    {0}};
+
+static error_t unparse_handler(int key, char *arg, struct argp_state *state);
+
+static const char unparse_args_doc[] = "[infile]";
+
+static const char unparse_doc[] =
+    "\n"
+    "Unparse an infoset file using a DFDL schema\n"
+    "\n"
+    "Unparse Options:"
+    "\v"
+    " Trailing arguments:\n"
+    "  infile (not required)      input file to unparse. If not specified, or "
+    "a value of -, reads from stdin";
+
+static const struct argp unparse_argp = {
+    unparse_options,  // array of CLI options
+    unparse_handler,  // function to get these CLI options
+    unparse_args_doc, // short usage documentation
+    unparse_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil unparse" CLI options
+
+static error_t
+unparse_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_unparse_cli *unparse = state->input;
+
+    switch (key)
+    {
+    case 'I':
+        unparse->infoset_type = arg;
+        break;
+
+    case 'o':
+        unparse->outfile = arg;
+        break;
+
+    case ARGP_KEY_ARG:
+        if (state->arg_num)
+        {
+            argp_error(state, "too many arguments: %s", arg);
+        }
+        unparse->infile = arg;
+        break;
+
+    default:
+        return ARGP_ERR_UNKNOWN;
+    }
+
+    return 0;
+}
+
+// Parse our "daffodil unparse" command line interface
+
+static error_t
+parse_daffodil_unparse_cli(struct argp_state *state)
+{
+    int    argc = state->argc - state->next + 1;
+    char **argv = &state->argv[state->next - 1];
+    char * old_cmd = argv[0];
+    char   new_cmd[strlen(state->name) + strlen(" unparse") + 1];
+
+    sprintf(new_cmd, "%s unparse", state->name);
+    argv[0] = new_cmd;
+
+    error_t status = argp_parse(&unparse_argp, argc, argv, ARGP_IN_ORDER, &argc,
+                                &daffodil_unparse);
+
+    argv[0] = old_cmd;
+    state->next += argc - 1;
+
+    return status;
+}
+
+// Initialize our "daffodil" CLI options
+
+struct daffodil_cli daffodil_cli = {
+    DAFFODIL_NONE, // default subcommand
+    0,             // default verbosity
+};
+
+static const struct argp_option daffodil_options[] = {
+    {"verbose", 'v', 0, 0, "Increment verbosity level, one level for each -v",
+     -1},
+
+    {0}};
+
+static error_t daffodil_handler(int key, char *arg, struct argp_state *state);
+
+static const char daffodil_args_doc[] = "<subcommand> [SUBCOMMAND_OPTION...]";
+
+static const char daffodil_doc[] =
+    "\n"
+    "Global Options:"
+    "\v"
+    "Subcommands:\n"
+    "  parse         Parse data to a DFDL infoset\n"
+    "  unparse       Unparse a DFDL infoset\n"
+    "\n"
+    "Run 'daffodil <subcommand> --help' for subcommand specific options";
+
+static const struct argp daffodil_argp = {
+    daffodil_options,  // array of CLI options
+    daffodil_handler,  // function to get these CLI options
+    daffodil_args_doc, // short usage documentation
+    daffodil_doc,      // long help documentation
+};
+
+// Handle callbacks to get our "daffodil" CLI options
+
+static error_t
+daffodil_handler(int key, char *arg, struct argp_state *state)
+{
+    struct daffodil_cli *daffodil = state->input;

Review comment:
       can we name this local variable "cli" instead of "daffodil" ?

##########
File path: daffodil-tdml-processor/src/main/scala/org/apache/daffodil/tdml/processor/runtime2/Runtime2TDMLDFDLProcessor.scala
##########
@@ -0,0 +1,234 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.tdml.processor.runtime2
+
+import org.apache.daffodil.api._
+import org.apache.daffodil.compiler.Compiler
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.runtime2.GeneratedCodeCompiler
+import org.apache.daffodil.runtime2.ParseResult
+import org.apache.daffodil.runtime2.UnparseResult
+import org.apache.daffodil.tdml.processor._
+import org.apache.daffodil.xml.XMLUtils
+
+import scala.xml.Node
+
+final class TDMLDFDLProcessorFactory private(
+  private var compiler: Compiler,
+  private var checkAllTopLevel: Boolean,
+  validateDFDLSchemasArg: Boolean)
+  extends AbstractTDMLDFDLProcessorFactory {
+
+  override def validateDFDLSchemas = validateDFDLSchemasArg
+
+  override type R = TDMLDFDLProcessorFactory
+
+  override def implementationName = "daffodil-runtime2"
+
+  def this() = this(compiler = Compiler(validateDFDLSchemas = true),
+    checkAllTopLevel = false,
+    validateDFDLSchemasArg = true)
+
+  private def copy(
+    compiler: Compiler = compiler,
+    checkAllTopLevel: Boolean = checkAllTopLevel,
+    validateDFDLSchemas: Boolean = validateDFDLSchemas) =
+    new TDMLDFDLProcessorFactory(compiler, checkAllTopLevel, validateDFDLSchemas)
+
+  /**
+   * Deprecated methods must be implemented. Some are just stubs though now.
+   */
+  @deprecated("Use withValidateDFDLSchemas.", "2.6.0")
+  override def setValidateDFDLSchemas(bool: Boolean): Unit = {
+    compiler = compiler.withValidateDFDLSchemas(bool)
+  }
+
+  override def withValidateDFDLSchemas(bool: Boolean): TDMLDFDLProcessorFactory = {
+    copy(compiler = compiler.withValidateDFDLSchemas(bool))
+  }
+
+  @deprecated("Use withCheckAllTopLevel.", "2.6.0")
+  override def setCheckAllTopLevel(checkAllTopLevel: Boolean): Unit = {
+    compiler = compiler.withCheckAllTopLevel(checkAllTopLevel)
+  }
+
+  override def withCheckAllTopLevel(checkAllTopLevel: Boolean): TDMLDFDLProcessorFactory = {
+    copy(compiler = compiler.withCheckAllTopLevel(checkAllTopLevel))
+  }
+
+  @deprecated("Use withTunables.", "2.6.0")
+  override def setTunables(tunables: Map[String, String]): Unit =
+    compiler = compiler.withTunables(tunables)
+
+  override def withTunables(tunables: Map[String, String]): TDMLDFDLProcessorFactory =
+    copy(compiler = compiler.withTunables(tunables))
+
+  @deprecated("Use DaffodilTDMLDFDLProcessor.setExternalDFDLVariables.", "2.6.0")
+  override def setExternalDFDLVariables(externalVarBindings: Seq[Binding]): Unit =
+    compiler = compiler.withExternalDFDLVariablesImpl(externalVarBindings)
+
+  override def withExternalDFDLVariables(externalVarBindings: Seq[Binding]): TDMLDFDLProcessorFactory =
+    copy(compiler = compiler.withExternalDFDLVariablesImpl(externalVarBindings))
+
+  @deprecated("Use arguments to getProcessor()", "2.6.0")
+  override def setDistinguishedRootNode(name: String, namespace: String): Unit =
+    compiler = compiler.withDistinguishedRootNode(name, namespace)
+
+  // We're doing to replace this method with different code.
+  // Return result is a TDML.CompileResult - so it's the result
+  // of compiling the schema for the test.
+  override def getProcessor(
+    schemaSource: DaffodilSchemaSource,
+    useSerializedProcessor: Boolean,
+    optRootName: Option[String] = None,
+    optRootNamespace: Option[String] = None): TDML.CompileResult = {
+    val pf = compiler.compileSource(schemaSource, optRootName, optRootNamespace)
+    val res = if (pf.isError) {
+      Left(pf.getDiagnostics) // DFDL schema compilation diagnostics
+    } else {
+      // How can we move some of these calls to ProcessorFactory with tunable runtime = "runtime2"?
+      val rootElementName = optRootName.getOrElse("FIXME")
+      val codeGeneratorState = pf.generateCode()
+      val generatedCodeCompiler = new GeneratedCodeCompiler(pf)
+      generatedCodeCompiler.compile(rootElementName, codeGeneratorState)
+      val compileResult = if (pf.isError) {
+        Left(pf.getDiagnostics) // C code compilation diagnostics
+      } else {
+        val dp = new Runtime2TDMLDFDLProcessor(generatedCodeCompiler)
+        Right((pf.getDiagnostics, dp))
+      }
+      compileResult
+    }
+    res
+  }
+
+}
+
+/**
+ * Delegates all execution, error gathering, error access to the Runtime2DataProcessor object.
+ * The responsibility of this class is just for TDML matching up. That is dealing with TDML
+ * XML Infosets, feeding to the unparser, creating XML from the result created by the
+ * Runtime2DataProcessor object. All the "real work" is done by generatedCodeCompiler.dataProcessor.
+ */
+class Runtime2TDMLDFDLProcessor(generatedCodeCompiler: GeneratedCodeCompiler) extends TDMLDFDLProcessor {
+
+  override type R = Runtime2TDMLDFDLProcessor
+
+  private val dataProcessor = generatedCodeCompiler.dataProcessor
+  private var anyErrors: Boolean = false
+  private var diagnostics: Seq[Diagnostic] = Nil
+
+  @deprecated("Use withDebugging.", "2.6.0")
+  override def setDebugging(b: Boolean) = ???
+  override def withDebugging(b: Boolean): Runtime2TDMLDFDLProcessor = this
+
+  @deprecated("Use withTracing.", "2.6.0")
+  override def setTracing(bool: Boolean): Unit = ???
+  override def withTracing(bool: Boolean): Runtime2TDMLDFDLProcessor = this
+
+  @deprecated("Use withDebugger.", "2.6.0")
+  override def setDebugger(db: AnyRef): Unit = ???
+  override def withDebugger(db: AnyRef): Runtime2TDMLDFDLProcessor = this
+
+  @deprecated("Use withValidationMode.", "2.6.0")
+  override def setValidationMode(validationMode: ValidationMode.Type): Unit = ???
+  override def withValidationMode(validationMode: ValidationMode.Type): Runtime2TDMLDFDLProcessor = this
+
+  @deprecated("Use withExternalDFDLVariables.", "2.6.0")
+  override def setExternalDFDLVariables(externalVarBindings: Seq[Binding]): Unit = ???
+  override def withExternalDFDLVariables(externalVarBindings: Seq[Binding]): Runtime2TDMLDFDLProcessor = this
+
+  // Actually run the C code and save any errors to be returned here
+  override def isError: Boolean = anyErrors
+  override def getDiagnostics: Seq[Diagnostic] = diagnostics
+
+  // This part will change a lot (it will execute C code instead).
+  // Whatever the parse produces needs to be converted into XML for comparison.
+  // We'll need a way to convert, say, a C struct to XML, and XML to C struct.
+  // The C code will need a bunch of toXML methods so it can produce output
+  // for comparison.
+  override def parse(is: java.io.InputStream, lengthLimitInBits: Long): TDMLParseResult = {
+    // We will run the generated and compiled C code, collect and save any errors
+    // and diagnostics to be returned in isError and getDiagnostics, and build an
+    // infoset.  Our context here is TDML-related, so we need to move that functionality
+    // to something generic that we call from here, you're saying.  I got it.  So we

Review comment:
       Clean up conversational comments. Much of this discussion suggests uncertainty that I think is much clearer now. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-701394672


   Yes, I'd created the `wip-daffodil-2202-runtime2` branch in order to intentionally merge my PR into that branch rather than the main branch.  I haven't received a response from the os-lib author to the issue I opened (https://github.com/lihaoyi/os-lib/issues/47) and the absence of a Scala 2.11 os-lib build is preventing my PR from passing 6 of the 12 compile checks and Mike suggested using an integration branch so we don't mess up anyone else's PR.  Do you want me to use an integration branch or have me ask the devs list if it is time for Daffodil to drop support for Scala 2.11 yet?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499654666



##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       Eventually, this "static code" is going to become a rather large library. As big as both runtime1 parser and unparser together. Bigger given that this is C code, in terms of Lines of code of course. Think in terms of 50K lines of code kind of library. So I think it needs to be built as a static library. 
   
   I am somewhat surprised that just putting c code into src/main/c  and /src/main/h doesn't just work transparently. We can't be the first project mixing C code and Java/scala using SBT. Lots of JNI project must do this right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499586082



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/GeneratedCodeCompiler.scala
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import org.apache.commons.io.FileUtils
+import org.apache.daffodil.compiler.ProcessorFactory
+import org.apache.daffodil.dsom.SchemaDefinitionError
+import org.apache.daffodil.runtime2.generators.CodeGeneratorState
+import org.apache.daffodil.util.Misc
+import os.Pipe
+
+class GeneratedCodeCompiler(pf: ProcessorFactory) {

Review comment:
       Agreed, I'd like to hear others thoughts. I don't think I have a strong preference either way.
   
   I'm wondering if there are may IDE issues? We can definitely support this C compiler plugin in sbt, but if people use other IDEs, maybe they would have a harder time building everything since this is sort of Java focused project. I assuming eclipse and others can deal with mixing Java and C, but can it package a static lib in a jar? We'd robably want to update our eclipse configs so this "just works" for people wanting to use SBT. Other IDE's to consider as well.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-701353643


   Oops, I think I accidentally closed this pull request? I deleted the ``wip-daffodil-2202-runtime2`` since it had no commits, thinking it was just accidentally pushed to the asf repo instead of someones fork. But perhaps this PR was intentionally create to merge into this new branch rather than master?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r497840101



##########
File path: daffodil-runtime2/src/main/c/xml_reader.c
##########
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include "xml_reader.h"
+#include <errno.h>  // for errno, ERANGE
+#include <limits.h> // for LONG_MAX, LONG_MIN
+#include <mxml.h>   // for mxmlWalkNext, mxmlGetElement, mxmlGetType, ...
+#include <stdint.h> // for int32_t, INT32_MAX, INT32_MIN
+#include <stdlib.h> // for NULL, strtol
+#include <string.h> // for strcmp, strerror
+
+// Read XML data from file before walking infoset
+
+static const char *
+xmlStartDocument(XMLReader *reader)
+{
+    // Load the XML data into memory
+    reader->xml = mxmlLoadFile(NULL, reader->stream, MXML_OPAQUE_CALLBACK);
+    reader->node = reader->xml;
+    if (reader->node == NULL) {
+        return "Unable to read XML data from input file";
+    }
+
+    // Consume the <?xml line if there is one
+    const char *name = mxmlGetElement(reader->node);
+    if (name && strncmp(name, "?xml", strlen("?xml")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+        name = mxmlGetElement(reader->node);
+    }
+
+    // Consume a comment if there is one
+    if (name && strncmp(name, "!--", strlen("!--")) == 0) {
+        do
+        {
+            reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+        } while (mxmlGetType(reader->node) == MXML_OPAQUE);
+    }
+
+    return reader->node ? NULL : "Ran out of XML data";
+}
+
+// Delete XML data after walking infoset
+
+static const char *
+xmlEndDocument(XMLReader *reader)
+{
+    // Consume any remaining newlines or whitespace
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Check whether we have consumed all of the XML data
+    if (reader->node != NULL) {
+        // This code path exits the program - no need to call mxmlDelete
+        return "Did not consume all of the XML data";
+    }
+
+    // Free the storage allocated to hold the XML data
+    mxmlDelete(reader->xml);
+    reader->xml = NULL;
+    reader->node = NULL;
+    return NULL;
+}
+
+// Continue walking both XML data and infoset in lockstep
+
+static const char *
+xmlStartComplex(XMLReader *reader, const InfosetBase *base)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *name_from_infoset = base->erd->namedQName.name;
+    reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+
+    // Check whether we are walking both XML data and infoset in lockstep
+    if (name_from_xml && name_from_infoset)
+    {
+        return strcmp(name_from_xml, name_from_infoset) == 0
+                   ? NULL
+                   : "Found mismatch between XML data and infoset";
+    }
+    else
+    {
+        return "Ran out of XML data";
+    }
+}
+
+// Consume XML data only on start events, not end events
+
+static const char *
+xmlEndComplex(XMLReader *reader, const InfosetBase *base)
+{
+    (void)reader;
+    (void)base;
+    return NULL;
+}
+
+// Read 32-bit integer value from XML data
+
+static const char *
+xmlInt32Elem(XMLReader *reader, const ERD *erd, int32_t *location)
+{
+    // Consume any newlines or whitespace before the element
+    while (mxmlGetType(reader->node) == MXML_OPAQUE) {
+        reader->node = mxmlWalkNext(reader->node, reader->xml, MXML_DESCEND);
+    }
+
+    // Get the element and consume it
+    const char *name_from_xml = mxmlGetElement(reader->node);
+    const char *number_from_xml = mxmlGetOpaque(reader->node);

Review comment:
       Ok, I read some of the mxml doc. Yeah, no mxmlGetTYPE calls. Surprising. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499583699



##########
File path: build.sbt
##########
@@ -43,6 +46,32 @@ lazy val runtime1         = Project("daffodil-runtime1", file("daffodil-runtime1
                               .dependsOn(io, lib % "test->test", udf, macroLib % "compile-internal, test-internal")
                               .settings(commonSettings, usesMacros)
 
+val runtime2StaticLib     = Library("libruntime2.a")
+lazy val runtime2         = Project("daffodil-runtime2", file("daffodil-runtime2")).configs(IntegrationTest)

Review comment:
       Agreed with your suggestions. Though, I suggest we do the rename separately (except maybe runtime2 since it's not merged yet), if we do decide to change names. Sometimes github doesn't detect that a change is a rename and it makes it really difficult to see what changes are just renames and what are actual changes. So mixing changes with renames can sometimes make it hard to review. I too would like to hear what other devs think of this. It might also be worth a discussion on the dev list, since not everyone is paying detailed attention to this review. This comment is really unlreated to the PR, but just somethign I thought was worth bringing up since we now have an actual second runtime.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498505729



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/grammar/GrammarTerm.scala
##########
@@ -51,7 +52,8 @@ import org.apache.daffodil.dsom.Term
 abstract class Gram(contextArg: SchemaComponent)
   extends OOLAGHostImpl(contextArg)
   with BasicComponent
-  with GramRuntime1Mixin {
+  with GramRuntime1Mixin
+  with GramRuntime2Mixin {

Review comment:
       I am coming around to brandon's way of thinking here. The various kinds of code aren't in the same spot file-wise, they're in mixins, and the mixins separate the concerns into different files. But they are hanging off the same classes, so in that sense, all in the same place. 
   
   So for runtime2, we can try treating the grammar objects as just a data structure passed to the runtime2 back end "compiler", which then dispatches on type of grammar object. 
   
   The class duplication I was worrying about is a false issue. The backend-specific functions don't have to mirror the grammar object constructors. The backend compiler functions  just get passed the grammar object and access things off of it. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499577464



##########
File path: daffodil-cli/build.sbt
##########
@@ -42,6 +42,18 @@ mappings in Universal ++= Seq(
   baseDirectory.value / "README.md" -> "README.md",
 )
 
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "common_runtime.h" -> "include/common_runtime.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "daffodil_argp.h" -> "include/daffodil_argp.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "stack.h" -> "include/stack.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_reader.h" -> "include/xml_reader.h",
+  baseDirectory.value / ".." / "daffodil-runtime2" / "src" / "main" / "c" / "xml_writer.h" -> "include/xml_writer.h",
+)
+
+mappings in Universal ++= Seq(
+  baseDirectory.value / ".." / "daffodil-runtime2" / "target" / "streams" / "compile" / "ccTargetMap" / "_global" / "streams" / "compile" / "sbtcc.Library" / "libruntime2.a" -> "lib/libruntime2.a",

Review comment:
       I think this is something that has to be done in the main ``build.sbt`` config. I dont't think you can access settings in a sibling/parent project. The ``..`` path stuff is sort of a hack to get access files in other subprojects since sbt doesn't care if you directly access files. But it can't be done with settings. So the main build.sbt probably needs something along the lines of this:
   
   ```scala
   lazy val cli  = Project("daffodil-cli", file("daffodil-cli")).configs(IntegrationTest)
                     ...
                     .settings(mappings in Universal ++= (Compile / runtime2 / ccLinkLibraries)) 
   ```
   
   Not sure exactly the syntax, but it's going to be somethign like that
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498472789



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import os.Pipe
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+  /**
+   * Returns a data processor with all the same state, but the validation mode changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is performed.
+   */
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {

Review comment:
       Regarding the TDML requirement, I think this can all be handled via the TDMLDFDLProcessor API. Perhaps different backends will also want different TDMLDFDLlProcessor implementations. The runtime2 implementation would generate and compile code, execute the binary with the appropriate command line arguments based on the test configuration, capture the output, and return it to the TDML Runner to check for correctness. We already have support for tests to define which implementation they work for and the ability to switch different implementations. So it's just a matter of defining this new implementation. So there is certainly a concept of a DFDLProcessor specific to runtime2, but it doesn't necessarily need to match the DataProcessor API that runtime1 uses. It really just needs to match the TDMLDFDLProcessor API.
   
   Agreed on the second requirment. Generating a shared object with some standard API that users can easily call ultimately seems like best approach. Generating another CLI implemented in C is easy for testing, but utlimately likely not how people will want to use it in the real world. I could imagine an implementation of the TDMLDFDLProcessor could use JNI to dynamically load these generated libraries and hook into the right function calls to parse/unparse.
   
   Regarding the save-parser, I'm wonder if options really become part of the runtime. Different runtimes can determine which options they expect and what actions to take.
   
   For example, the --debug/--trace options might not make sense with runtime2. Debugging the generated C code likely just means someone should compile the code and execute it with GDB, for example. And the save-parser command for the runtime2 likely wants additional parameters specific to it, like c compiler options, or output diretory for header files. The runtimes could very well require very different parameters. I'm not sure the following is a good idea, but maybe the current CLI wants to become something like this:
   
   ```
   usage: daffodil <runtime> <runtime subcommands/options>
   
   daffodil runtime1 parse --schema foo.dfdl.xsd ...
   daffodil runtime2 --schema foo.dfdl.sd -copts=-Wfoo --incldir include/
   ```
   Note that different runtimes don't necessarily need to have subcommands. Runtime1 would likely have many of the existing subcommands.
   
   Anoher option would be to keep the root level parse/unparse/save-parser options, and just have a --runtime option, and depending on which runtime option you select different options will become available. That might be a bit easier and more familiar, but the usage likely gets more complicated since availability of parameters changes depending on the runtime. I'm not sure how well scallop can handle something like that.
   
    
   Another option to the save-parser thing
   To be even more pluggable, it would also be interesting if the Daffodil Scala CLI could ask the different backends for their CLI options.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r499665371



##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       Are you suggesting this would be too large to compile along with the generated code?
   
   Looks like JNI code includes compiled libraries in the jar and then uses System.load to load the libraries:
   
   https://stackoverflow.com/questions/2937406/how-to-bundle-a-native-library-and-a-jni-library-inside-a-jar
   
   We can probably distribute the static library in the jar, and then the c-generator can extract it when needed. I don't think there's  a need for System.load since the runtime isn't actually executing that code. But that suggests the right distribution medium is withing the jar. The C headers still need to be included in the the jar  and extracted to allow the generated code to be compiled with it though.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] mbeckerle commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
mbeckerle commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498449005



##########
File path: daffodil-core/src/main/scala/org/apache/daffodil/runtime2/Runtime2DataProcessor.scala
##########
@@ -0,0 +1,211 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.runtime2
+
+import java.io.File
+import java.io.InputStream
+import java.io.OutputStream
+
+import org.apache.daffodil.api.DFDL
+import org.apache.daffodil.api.DaffodilTunables
+import org.apache.daffodil.api.DataLocation
+import org.apache.daffodil.api.ValidationMode
+import org.apache.daffodil.externalvars.Binding
+import org.apache.daffodil.processors.Failure
+import org.apache.daffodil.processors.ProcessorResult
+import org.apache.daffodil.processors.Success
+import org.apache.daffodil.processors.VariableMap
+import org.apache.daffodil.processors.WithDiagnosticsImpl
+import org.apache.daffodil.processors.parsers.ParseError
+import org.apache.daffodil.processors.unparsers.UnparseError
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import os.Pipe
+
+/**
+ * Effectively a scala proxy object that does its work via the underlying C-code.
+ * Will need to consider how to use features of underlying C-code to get infoset,
+ * walk infoset, generate XML for use by TDML tests.
+ */
+class Runtime2DataProcessor(executableFile: os.Path) extends DFDL.DataProcessorBase {
+  /**
+   * Returns a data processor with all the same state, but the validation mode changed to that of the argument.
+   *
+   * Note that the default validation mode is "off", that is, no validation is performed.
+   */
+  override def withValidationMode(mode: ValidationMode.Type): DFDL.DataProcessor = ???
+
+  override def withTunable(name: String, value: String): DFDL.DataProcessor = ???
+
+  override def withTunables(tunables: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Map[String, String]): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: File): DFDL.DataProcessor = ???
+
+  override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = ???
+
+  override def validationMode: ValidationMode.Type = ???
+
+  override def getTunables(): DaffodilTunables = ???
+
+  override def save(output: DFDL.Output): Unit = ???
+
+  override def variableMap: VariableMap = ???
+
+  override def setValidationMode(mode: ValidationMode.Type): Unit = ???
+
+  override def setExternalVariables(extVars: Map[String, String]): Unit = ???
+
+  override def setExternalVariables(extVars: File): Unit = ???
+
+  override def setExternalVariables(extVars: File, tunable: DaffodilTunables): Unit = ???
+
+  override def setExternalVariables(extVars: Seq[Binding]): Unit = ???
+
+  override def setTunable(tunable: String, value: String): Unit = ???
+
+  override def setTunables(tunables: Map[String, String]): Unit = ???
+
+  /**
+   * Returns an object which contains the result, and/or diagnostic information.
+   */
+  def parse(input: InputStream): ParseResult = {

Review comment:
       Agree we need  to think this through.
   
   Some requirements: 
   * use an alternate back-end, but still run TDML tests against it interactively. (C-code generation, C-compiler, and invoking of it all transparent to the user.)
   
   That requirement is what motivates the runtime2 library having a walker that outputs XML and inputs XML, and a TDML processor that glues it all together. This motivates the creation of things like DataProcessor even though the concept makes little sense for runtime2, it allows reuse of more code in creating the test interface for TDML. 
   
   If you have that requirement, the same mechanisms can be used to provide the CLI with the same "snap compiling" capability so that you can have daffodil parse and daffodil unparse for the CLI that "work-alike" across the back-ends.  
   
   A separate requirement is to be able to generate a ".a" or ".so" C-callable library and matching ".h" include files that implements parse and unparse C APIs. The generated code would be one library distinct from the static code runtime2 library (which is likely to be quite large eventually).  Creating this generated library feels like what the CLI should do when using runtime 2 backend and invoking the save parser option. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on a change in pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on a change in pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#discussion_r498364697



##########
File path: daffodil-runtime2/src/main/c/common_runtime.c
##########
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       A thought, instead of having this cc plugin and putting code in src/main/c/ to compile everything into a static lib, what if we instead just put these c/h files in src/main/resources/, and don't actually compile them as part of the sbt build. Instead when we generate code for a DFDL schema, we just copy these files out of the jar and then compile them along with the generate files? This removes any complexity in the sbt config about cc or libraries or anything, and just treats the static stuff like normal resources, making the sbt config much simpler.
   
   Downside is it would perhaps take a little longer to compile generated code since every schema will need to compile these same files, but perhaps the compilation is fast enough? And I suspect as this is made to support more complex schemas, the size of the generated code will outnumber the size of the static stuff. And if compilation time was an issue, the compiling part of this code generator could separately build the static lib and cache it for reuse for other DFDL schemas.
   
   Another downisde is that you can't easily compile and test just the static parts, but I figure the bulk of testing needs to be done with the generated code to ensure correctness, which will implicitly test the static code.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] tuxji closed pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
tuxji closed pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-daffodil] stevedlawrence commented on pull request #422: WIP: Add runtime2 backend with C code generator

Posted by GitBox <gi...@apache.org>.
stevedlawrence commented on pull request #422:
URL: https://github.com/apache/incubator-daffodil/pull/422#issuecomment-731337557


   +1, agreed this is certainly ready to be merged to runtime2-2202 branch. I like where this is heading, and more visibility will help weed out any issues and hopefully gain more traction.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org