You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@daffodil.apache.org by "Interrante, John A (GE Research, US)" <in...@research.ge.com> on 2020/10/05 18:11:48 UTC

Keep compiled C code or throw it away?

The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.

The current discussion in the pull request is now wavering between:

  1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
  2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
  3) Put just the C source files in a jar and distribute the jar with Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" commands will snap compile and/or execute the C files 

The question comes down to this: what is the best time to build the C source files?  

  - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
  - After distribution: We simplify the sbt build and don't need to build multiple daffodil distributions for different platforms

Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience. 

So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?

-----Original Message-----
From: Steve Lawrence <sl...@apache.org> 
Sent: Monday, October 5, 2020 10:49 AM
To: dev@daffodil.apache.org
Subject: EXT: Re: Subproject names proposed for discussion

A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.

1) Is there any benefit organizationally to having all backends being in the same directory?

2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.

3) Maybe the scala parser/unparser should be considered the same "scala"
runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?

4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?

Taking all of the above into account, perhaps something like this:

...
|-- daffodil-backends
|   |-- daffodil-scala
|   |   `-- src
|   `-- daffodil-generator-c
|       `-- src
|-- daffodil-lib
|   `-- src
|-- daffodil-schema-compiler
|   `-- src
...

5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?

6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.

On 10/5/20 10:23 AM, Beckerle, Mike wrote:
> +1 from me.
> 
> ________________________________
> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
> Sent: Monday, October 5, 2020 9:28 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Subproject names proposed for discussion
> 
> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
> 
> What do devs think about us renaming some subprojects like this?
> 
>     rename daffodil-core to daffodil-schema-compiler
>     leave daffodil-lib alone
>     rename daffodil-runtime1 to daffodil-backend-parser-scala
>     rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>     rename daffodil-runtime2 to daffodil-backend-generator-c
> 
>

Re: Keep compiled C code or throw it away?

Posted by Steve Lawrence <sl...@apache.org>.

Understood. We only plane to include .c/.h files in the source relase.
If we do distribute a precompiled binary, it would only be in a
convenience jar/zip/tar. Though, it sounds like we're leaning towards
not distributing any pre-compiled code at all, and it will always be
compiled on the target machine. With the build configuration allowing a
simple way to build the lib for dev testing purposes.

On 10/6/20 12:45 PM, Dave Fisher wrote:
> However the community decides Apache open source releases must not include binary code.
> 
> Binary convenience releases made by the community can be made in addition to the source release.
> 
> Sent from my iPhone
> 
>> On Oct 5, 2020, at 2:26 PM, Beckerle, Mike <mb...@owlcyberdefense.com> wrote:
>>
>> re: new TDMLDFDLProcessor is needed. This exists and I think works as you have mentioned.
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Monday, October 5, 2020 5:12 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Keep compiled C code or throw it away?
>>
>>> This would work quite like daffodil-propgen then. Just at test-compile
>> time, not regular compile time.
>>
>> That requires an sbt source/resource generator, which means it depends
>> on the sbt configuration in order to test. Might make testing with other
>> IDE's more difficult. It also means something like the "daffodil test"
>> CLI command couldn't work since that doesn't use sbt.
>>
>> What if we just create a new TDMLDFDLProcessor that is specific to the
>> new c generator backend. This new TDMLDFDLProcessor can generate code
>> based on the schema being tested, compile the schemas (using caching
>> where possible), and execute whatever is compiled to parse/unparse,
>> capture the result, and return it as a Parser/UnparseResult that the
>> TDMLDaffodilProcessor can use. This TDMLDFDLProcessor essentially mimics
>> how a normal user would use it, just like the current TDMLDFDLProcessor
>> does.
>>
>> This is analogous to how the IBM DFDL implementation works. This
>> TDMLDFDLProcesor just happens to use the same Daffodil frontend but with
>> a different Daffodil backend.
>>
>>
>>> On 10/5/20 5:02 PM, Beckerle, Mike wrote:
>>> There are 3 different kinds of code in Daffodil:
>>>
>>> 1) static code humans write - compiler code and runtime code, and test rig code. This includes scala, java, TDML, and soon enough, C code.
>>>
>>> 2) code that is generated that becomes part of daffodil itself. This is generated by code in the daffodil-propgen library and creates src-managed, and resource-managed code and resources in the daffodil-lib.
>>>
>>> The above are taken care of by SBT, whether scala, java, or C code.
>>>
>>> None of the above has anything to do with a DFDL schema created by a user.
>>>
>>> 3) The C-code generator creates C code from a user's schema.
>>>
>>> I would expect that generator to perhaps lay down not just the C code, but make/build files so the user can build and run their code stand-alone.
>>>
>>> But I think of this as 100% separate from daffodil's build.sbt build system. It could use sbt even, but it's not daffodil's build.
>>>
>>> The place where things get confusing is that in order to test (3) above, we need to incorporate generating, compiling, linking, and running the generated C code into daffodil's build, for testing purposes.
>>>
>>> So I think as a part of daffodil's build, analogous to how daffodil-propgen puts Scala code into daffodil-lib/src-managed/scala/... the C-code generator's src/test/scala code can be used to put C code into daffodil-runtime2/test-managed/C/....
>>>
>>> That test-managed/C code would only be for test, but sbt would see it there and compile it almost as if it were hand-written C code.
>>>
>>> This would work quite like daffodil-propgen then. Just at test-compile time, not regular compile time.
>>>
>>> Does that makes sense?
>>>
>>>
>>>
>>> ________________________________
>>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>>> Sent: Monday, October 5, 2020 2:11 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Keep compiled C code or throw it away?
>>>
>>> The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.
>>>
>>> The current discussion in the pull request is now wavering between:
>>>
>>>  1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
>>>  2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
>>>  3) Put just the C source files in a jar and distribute the jar with Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" commands will snap compile and/or execute the C files
>>>
>>> The question comes down to this: what is the best time to build the C source files?
>>>
>>>  - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
>>>  - After distribution: We simplify the sbt build and don't need to build multiple daffodil distributions for different platforms
>>>
>>> Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience.
>>>
>>> So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?
>>>
>>> -----Original Message-----
>>> From: Steve Lawrence <sl...@apache.org>
>>> Sent: Monday, October 5, 2020 10:49 AM
>>> To: dev@daffodil.apache.org
>>> Subject: EXT: Re: Subproject names proposed for discussion
>>>
>>> A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.
>>>
>>> 1) Is there any benefit organizationally to having all backends being in the same directory?
>>>
>>> 2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.
>>>
>>> 3) Maybe the scala parser/unparser should be considered the same "scala"
>>> runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?
>>>
>>> 4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?
>>>
>>> Taking all of the above into account, perhaps something like this:
>>>
>>> ...
>>> |-- daffodil-backends
>>> |   |-- daffodil-scala
>>> |   |   `-- src
>>> |   `-- daffodil-generator-c
>>> |       `-- src
>>> |-- daffodil-lib
>>> |   `-- src
>>> |-- daffodil-schema-compiler
>>> |   `-- src
>>> ...
>>>
>>> 5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?
>>>
>>> 6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.
>>>
>>>
>>>
>>>> On 10/5/20 10:23 AM, Beckerle, Mike wrote:
>>>> +1 from me.
>>>>
>>>> ________________________________
>>>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>>>> Sent: Monday, October 5, 2020 9:28 AM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Subproject names proposed for discussion
>>>>
>>>> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
>>>>
>>>> What do devs think about us renaming some subprojects like this?
>>>>
>>>>    rename daffodil-core to daffodil-schema-compiler
>>>>    leave daffodil-lib alone
>>>>    rename daffodil-runtime1 to daffodil-backend-parser-scala
>>>>    rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>>>>    rename daffodil-runtime2 to daffodil-backend-generator-c
>>>>
>>>>
>>>
>>>
>>
>

Re: Keep compiled C code or throw it away?

Posted by Dave Fisher <wa...@comcast.net>.

However the community decides Apache open source releases must not include binary code.

Binary convenience releases made by the community can be made in addition to the source release.

Sent from my iPhone

> On Oct 5, 2020, at 2:26 PM, Beckerle, Mike <mb...@owlcyberdefense.com> wrote:
> 
> re: new TDMLDFDLProcessor is needed. This exists and I think works as you have mentioned.
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Monday, October 5, 2020 5:12 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Keep compiled C code or throw it away?
> 
>> This would work quite like daffodil-propgen then. Just at test-compile
> time, not regular compile time.
> 
> That requires an sbt source/resource generator, which means it depends
> on the sbt configuration in order to test. Might make testing with other
> IDE's more difficult. It also means something like the "daffodil test"
> CLI command couldn't work since that doesn't use sbt.
> 
> What if we just create a new TDMLDFDLProcessor that is specific to the
> new c generator backend. This new TDMLDFDLProcessor can generate code
> based on the schema being tested, compile the schemas (using caching
> where possible), and execute whatever is compiled to parse/unparse,
> capture the result, and return it as a Parser/UnparseResult that the
> TDMLDaffodilProcessor can use. This TDMLDFDLProcessor essentially mimics
> how a normal user would use it, just like the current TDMLDFDLProcessor
> does.
> 
> This is analogous to how the IBM DFDL implementation works. This
> TDMLDFDLProcesor just happens to use the same Daffodil frontend but with
> a different Daffodil backend.
> 
> 
>> On 10/5/20 5:02 PM, Beckerle, Mike wrote:
>> There are 3 different kinds of code in Daffodil:
>> 
>> 1) static code humans write - compiler code and runtime code, and test rig code. This includes scala, java, TDML, and soon enough, C code.
>> 
>> 2) code that is generated that becomes part of daffodil itself. This is generated by code in the daffodil-propgen library and creates src-managed, and resource-managed code and resources in the daffodil-lib.
>> 
>> The above are taken care of by SBT, whether scala, java, or C code.
>> 
>> None of the above has anything to do with a DFDL schema created by a user.
>> 
>> 3) The C-code generator creates C code from a user's schema.
>> 
>> I would expect that generator to perhaps lay down not just the C code, but make/build files so the user can build and run their code stand-alone.
>> 
>> But I think of this as 100% separate from daffodil's build.sbt build system. It could use sbt even, but it's not daffodil's build.
>> 
>> The place where things get confusing is that in order to test (3) above, we need to incorporate generating, compiling, linking, and running the generated C code into daffodil's build, for testing purposes.
>> 
>> So I think as a part of daffodil's build, analogous to how daffodil-propgen puts Scala code into daffodil-lib/src-managed/scala/... the C-code generator's src/test/scala code can be used to put C code into daffodil-runtime2/test-managed/C/....
>> 
>> That test-managed/C code would only be for test, but sbt would see it there and compile it almost as if it were hand-written C code.
>> 
>> This would work quite like daffodil-propgen then. Just at test-compile time, not regular compile time.
>> 
>> Does that makes sense?
>> 
>> 
>> 
>> ________________________________
>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>> Sent: Monday, October 5, 2020 2:11 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Keep compiled C code or throw it away?
>> 
>> The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.
>> 
>> The current discussion in the pull request is now wavering between:
>> 
>>  1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
>>  2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
>>  3) Put just the C source files in a jar and distribute the jar with Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" commands will snap compile and/or execute the C files
>> 
>> The question comes down to this: what is the best time to build the C source files?
>> 
>>  - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
>>  - After distribution: We simplify the sbt build and don't need to build multiple daffodil distributions for different platforms
>> 
>> Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience.
>> 
>> So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?
>> 
>> -----Original Message-----
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Monday, October 5, 2020 10:49 AM
>> To: dev@daffodil.apache.org
>> Subject: EXT: Re: Subproject names proposed for discussion
>> 
>> A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.
>> 
>> 1) Is there any benefit organizationally to having all backends being in the same directory?
>> 
>> 2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.
>> 
>> 3) Maybe the scala parser/unparser should be considered the same "scala"
>> runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?
>> 
>> 4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?
>> 
>> Taking all of the above into account, perhaps something like this:
>> 
>> ...
>> |-- daffodil-backends
>> |   |-- daffodil-scala
>> |   |   `-- src
>> |   `-- daffodil-generator-c
>> |       `-- src
>> |-- daffodil-lib
>> |   `-- src
>> |-- daffodil-schema-compiler
>> |   `-- src
>> ...
>> 
>> 5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?
>> 
>> 6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.
>> 
>> 
>> 
>>> On 10/5/20 10:23 AM, Beckerle, Mike wrote:
>>> +1 from me.
>>> 
>>> ________________________________
>>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>>> Sent: Monday, October 5, 2020 9:28 AM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Subproject names proposed for discussion
>>> 
>>> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
>>> 
>>> What do devs think about us renaming some subprojects like this?
>>> 
>>>    rename daffodil-core to daffodil-schema-compiler
>>>    leave daffodil-lib alone
>>>    rename daffodil-runtime1 to daffodil-backend-parser-scala
>>>    rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>>>    rename daffodil-runtime2 to daffodil-backend-generator-c
>>> 
>>> 
>> 
>> 
>

RE: Keep compiled C code or throw it away?

Posted by "Interrante, John A (GE Research, US)" <in...@research.ge.com>.

Yes, I have implemented a Runtime2TDMLDFDLProcessor and it works as Steve described although it delegates some of the actual work to a Runtime2DataProcessor.  I already had to implement generating, compiling, linking, and running the generated C code in the Runtime2DataProcessor and Runtime2TDMLDFDLProcessor classes.   The parts that I may change based on the pull request comments and dev list discussion will be:

1) No need to provide a Scala parser, unparser API to the "daffodil parse", "daffodil unparse" CLI commands (merge Runtime2DataProcessor into Runtime2TDMLProcessor).

2) Continue compiling C source files with sbt for the sake of quicker compile/edit/fix cycles, but don't include machine binary files in the Daffodil distribution (put only C source and header files in a distribution jar).

3) Make the C code generator and Runtime2TDMLProcessor logic work the same regardless of when and where the code's called (on a dev's computer before building a distribution or after a distribution is installed on a user's computer).

4) Change the snap compilation, linking, and execution implementation to extract all C source and header files from jars and use caching where possible (in user's XDG_CACHE_HOME and using ideas from NixOS for building immutable, reproducible package stores)

5) Use the same jar extraction, snap compilation, caching where possible, linking, and execution implementation whether invoked by the IDE running Scala test cases, sbt test, the "daffodil test <tdml-file>" command line, or the "daffodil generate C" command line.  

Does all that sound good?  


-----Original Message-----
From: Beckerle, Mike <mb...@owlcyberdefense.com> 
Sent: Monday, October 5, 2020 5:26 PM
To: dev@daffodil.apache.org
Subject: EXT: Re: Keep compiled C code or throw it away?

re: new TDMLDFDLProcessor is needed. This exists and I think works as you have mentioned.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Monday, October 5, 2020 5:12 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Keep compiled C code or throw it away?

> This would work quite like daffodil-propgen then. Just at test-compile
time, not regular compile time.

That requires an sbt source/resource generator, which means it depends on the sbt configuration in order to test. Might make testing with other IDE's more difficult. It also means something like the "daffodil test"
CLI command couldn't work since that doesn't use sbt.

What if we just create a new TDMLDFDLProcessor that is specific to the new c generator backend. This new TDMLDFDLProcessor can generate code based on the schema being tested, compile the schemas (using caching where possible), and execute whatever is compiled to parse/unparse, capture the result, and return it as a Parser/UnparseResult that the TDMLDaffodilProcessor can use. This TDMLDFDLProcessor essentially mimics how a normal user would use it, just like the current TDMLDFDLProcessor does.

This is analogous to how the IBM DFDL implementation works. This TDMLDFDLProcesor just happens to use the same Daffodil frontend but with a different Daffodil backend.


On 10/5/20 5:02 PM, Beckerle, Mike wrote:
> There are 3 different kinds of code in Daffodil:
>
> 1) static code humans write - compiler code and runtime code, and test rig code. This includes scala, java, TDML, and soon enough, C code.
>
> 2) code that is generated that becomes part of daffodil itself. This is generated by code in the daffodil-propgen library and creates src-managed, and resource-managed code and resources in the daffodil-lib.
>
> The above are taken care of by SBT, whether scala, java, or C code.
>
> None of the above has anything to do with a DFDL schema created by a user.
>
> 3) The C-code generator creates C code from a user's schema.
>
> I would expect that generator to perhaps lay down not just the C code, but make/build files so the user can build and run their code stand-alone.
>
> But I think of this as 100% separate from daffodil's build.sbt build system. It could use sbt even, but it's not daffodil's build.
>
> The place where things get confusing is that in order to test (3) above, we need to incorporate generating, compiling, linking, and running the generated C code into daffodil's build, for testing purposes.
>
> So I think as a part of daffodil's build, analogous to how daffodil-propgen puts Scala code into daffodil-lib/src-managed/scala/... the C-code generator's src/test/scala code can be used to put C code into daffodil-runtime2/test-managed/C/....
>
> That test-managed/C code would only be for test, but sbt would see it there and compile it almost as if it were hand-written C code.
>
> This would work quite like daffodil-propgen then. Just at test-compile time, not regular compile time.
>
> Does that makes sense?
>
>
>
> ________________________________
> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
> Sent: Monday, October 5, 2020 2:11 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Keep compiled C code or throw it away?
>
> The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.
>
> The current discussion in the pull request is now wavering between:
>
>   1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
>   2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
>   3) Put just the C source files in a jar and distribute the jar with 
> Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" 
> commands will snap compile and/or execute the C files
>
> The question comes down to this: what is the best time to build the C source files?
>
>   - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
>   - After distribution: We simplify the sbt build and don't need to 
> build multiple daffodil distributions for different platforms
>
> Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience.
>
> So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?
>
> -----Original Message-----
> From: Steve Lawrence <sl...@apache.org>
> Sent: Monday, October 5, 2020 10:49 AM
> To: dev@daffodil.apache.org
> Subject: EXT: Re: Subproject names proposed for discussion
>
> A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.
>
> 1) Is there any benefit organizationally to having all backends being in the same directory?
>
> 2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.
>
> 3) Maybe the scala parser/unparser should be considered the same "scala"
> runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?
>
> 4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?
>
> Taking all of the above into account, perhaps something like this:
>
> ...
> |-- daffodil-backends
> |   |-- daffodil-scala
> |   |   `-- src
> |   `-- daffodil-generator-c
> |       `-- src
> |-- daffodil-lib
> |   `-- src
> |-- daffodil-schema-compiler
> |   `-- src
> ...
>
> 5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?
>
> 6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.
>
>
>
> On 10/5/20 10:23 AM, Beckerle, Mike wrote:
>> +1 from me.
>>
>> ________________________________
>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>> Sent: Monday, October 5, 2020 9:28 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Subproject names proposed for discussion
>>
>> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
>>
>> What do devs think about us renaming some subprojects like this?
>>
>>     rename daffodil-core to daffodil-schema-compiler
>>     leave daffodil-lib alone
>>     rename daffodil-runtime1 to daffodil-backend-parser-scala
>>     rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>>     rename daffodil-runtime2 to daffodil-backend-generator-c
>>
>>
>
>

Re: Keep compiled C code or throw it away?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

re: new TDMLDFDLProcessor is needed. This exists and I think works as you have mentioned.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Monday, October 5, 2020 5:12 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Keep compiled C code or throw it away?

> This would work quite like daffodil-propgen then. Just at test-compile
time, not regular compile time.

That requires an sbt source/resource generator, which means it depends
on the sbt configuration in order to test. Might make testing with other
IDE's more difficult. It also means something like the "daffodil test"
CLI command couldn't work since that doesn't use sbt.

What if we just create a new TDMLDFDLProcessor that is specific to the
new c generator backend. This new TDMLDFDLProcessor can generate code
based on the schema being tested, compile the schemas (using caching
where possible), and execute whatever is compiled to parse/unparse,
capture the result, and return it as a Parser/UnparseResult that the
TDMLDaffodilProcessor can use. This TDMLDFDLProcessor essentially mimics
how a normal user would use it, just like the current TDMLDFDLProcessor
does.

This is analogous to how the IBM DFDL implementation works. This
TDMLDFDLProcesor just happens to use the same Daffodil frontend but with
a different Daffodil backend.


On 10/5/20 5:02 PM, Beckerle, Mike wrote:
> There are 3 different kinds of code in Daffodil:
>
> 1) static code humans write - compiler code and runtime code, and test rig code. This includes scala, java, TDML, and soon enough, C code.
>
> 2) code that is generated that becomes part of daffodil itself. This is generated by code in the daffodil-propgen library and creates src-managed, and resource-managed code and resources in the daffodil-lib.
>
> The above are taken care of by SBT, whether scala, java, or C code.
>
> None of the above has anything to do with a DFDL schema created by a user.
>
> 3) The C-code generator creates C code from a user's schema.
>
> I would expect that generator to perhaps lay down not just the C code, but make/build files so the user can build and run their code stand-alone.
>
> But I think of this as 100% separate from daffodil's build.sbt build system. It could use sbt even, but it's not daffodil's build.
>
> The place where things get confusing is that in order to test (3) above, we need to incorporate generating, compiling, linking, and running the generated C code into daffodil's build, for testing purposes.
>
> So I think as a part of daffodil's build, analogous to how daffodil-propgen puts Scala code into daffodil-lib/src-managed/scala/... the C-code generator's src/test/scala code can be used to put C code into daffodil-runtime2/test-managed/C/....
>
> That test-managed/C code would only be for test, but sbt would see it there and compile it almost as if it were hand-written C code.
>
> This would work quite like daffodil-propgen then. Just at test-compile time, not regular compile time.
>
> Does that makes sense?
>
>
>
> ________________________________
> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
> Sent: Monday, October 5, 2020 2:11 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Keep compiled C code or throw it away?
>
> The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.
>
> The current discussion in the pull request is now wavering between:
>
>   1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
>   2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
>   3) Put just the C source files in a jar and distribute the jar with Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" commands will snap compile and/or execute the C files
>
> The question comes down to this: what is the best time to build the C source files?
>
>   - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
>   - After distribution: We simplify the sbt build and don't need to build multiple daffodil distributions for different platforms
>
> Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience.
>
> So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?
>
> -----Original Message-----
> From: Steve Lawrence <sl...@apache.org>
> Sent: Monday, October 5, 2020 10:49 AM
> To: dev@daffodil.apache.org
> Subject: EXT: Re: Subproject names proposed for discussion
>
> A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.
>
> 1) Is there any benefit organizationally to having all backends being in the same directory?
>
> 2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.
>
> 3) Maybe the scala parser/unparser should be considered the same "scala"
> runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?
>
> 4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?
>
> Taking all of the above into account, perhaps something like this:
>
> ...
> |-- daffodil-backends
> |   |-- daffodil-scala
> |   |   `-- src
> |   `-- daffodil-generator-c
> |       `-- src
> |-- daffodil-lib
> |   `-- src
> |-- daffodil-schema-compiler
> |   `-- src
> ...
>
> 5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?
>
> 6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.
>
>
>
> On 10/5/20 10:23 AM, Beckerle, Mike wrote:
>> +1 from me.
>>
>> ________________________________
>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>> Sent: Monday, October 5, 2020 9:28 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Subproject names proposed for discussion
>>
>> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
>>
>> What do devs think about us renaming some subprojects like this?
>>
>>     rename daffodil-core to daffodil-schema-compiler
>>     leave daffodil-lib alone
>>     rename daffodil-runtime1 to daffodil-backend-parser-scala
>>     rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>>     rename daffodil-runtime2 to daffodil-backend-generator-c
>>
>>
>
>

Re: Keep compiled C code or throw it away?

Posted by Steve Lawrence <sl...@apache.org>.

> This would work quite like daffodil-propgen then. Just at test-compile
time, not regular compile time.

That requires an sbt source/resource generator, which means it depends
on the sbt configuration in order to test. Might make testing with other
IDE's more difficult. It also means something like the "daffodil test"
CLI command couldn't work since that doesn't use sbt.

What if we just create a new TDMLDFDLProcessor that is specific to the
new c generator backend. This new TDMLDFDLProcessor can generate code
based on the schema being tested, compile the schemas (using caching
where possible), and execute whatever is compiled to parse/unparse,
capture the result, and return it as a Parser/UnparseResult that the
TDMLDaffodilProcessor can use. This TDMLDFDLProcessor essentially mimics
how a normal user would use it, just like the current TDMLDFDLProcessor
does.

This is analogous to how the IBM DFDL implementation works. This
TDMLDFDLProcesor just happens to use the same Daffodil frontend but with
a different Daffodil backend.


On 10/5/20 5:02 PM, Beckerle, Mike wrote:
> There are 3 different kinds of code in Daffodil:
> 
> 1) static code humans write - compiler code and runtime code, and test rig code. This includes scala, java, TDML, and soon enough, C code.
> 
> 2) code that is generated that becomes part of daffodil itself. This is generated by code in the daffodil-propgen library and creates src-managed, and resource-managed code and resources in the daffodil-lib.
> 
> The above are taken care of by SBT, whether scala, java, or C code.
> 
> None of the above has anything to do with a DFDL schema created by a user.
> 
> 3) The C-code generator creates C code from a user's schema.
> 
> I would expect that generator to perhaps lay down not just the C code, but make/build files so the user can build and run their code stand-alone.
> 
> But I think of this as 100% separate from daffodil's build.sbt build system. It could use sbt even, but it's not daffodil's build.
> 
> The place where things get confusing is that in order to test (3) above, we need to incorporate generating, compiling, linking, and running the generated C code into daffodil's build, for testing purposes.
> 
> So I think as a part of daffodil's build, analogous to how daffodil-propgen puts Scala code into daffodil-lib/src-managed/scala/... the C-code generator's src/test/scala code can be used to put C code into daffodil-runtime2/test-managed/C/....
> 
> That test-managed/C code would only be for test, but sbt would see it there and compile it almost as if it were hand-written C code.
> 
> This would work quite like daffodil-propgen then. Just at test-compile time, not regular compile time.
> 
> Does that makes sense?
> 
> 
> 
> ________________________________
> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
> Sent: Monday, October 5, 2020 2:11 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Keep compiled C code or throw it away?
> 
> The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.
> 
> The current discussion in the pull request is now wavering between:
> 
>   1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
>   2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
>   3) Put just the C source files in a jar and distribute the jar with Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" commands will snap compile and/or execute the C files
> 
> The question comes down to this: what is the best time to build the C source files?
> 
>   - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
>   - After distribution: We simplify the sbt build and don't need to build multiple daffodil distributions for different platforms
> 
> Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience.
> 
> So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?
> 
> -----Original Message-----
> From: Steve Lawrence <sl...@apache.org>
> Sent: Monday, October 5, 2020 10:49 AM
> To: dev@daffodil.apache.org
> Subject: EXT: Re: Subproject names proposed for discussion
> 
> A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.
> 
> 1) Is there any benefit organizationally to having all backends being in the same directory?
> 
> 2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.
> 
> 3) Maybe the scala parser/unparser should be considered the same "scala"
> runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?
> 
> 4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?
> 
> Taking all of the above into account, perhaps something like this:
> 
> ...
> |-- daffodil-backends
> |   |-- daffodil-scala
> |   |   `-- src
> |   `-- daffodil-generator-c
> |       `-- src
> |-- daffodil-lib
> |   `-- src
> |-- daffodil-schema-compiler
> |   `-- src
> ...
> 
> 5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?
> 
> 6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.
> 
> 
> 
> On 10/5/20 10:23 AM, Beckerle, Mike wrote:
>> +1 from me.
>>
>> ________________________________
>> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
>> Sent: Monday, October 5, 2020 9:28 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Subproject names proposed for discussion
>>
>> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
>>
>> What do devs think about us renaming some subprojects like this?
>>
>>     rename daffodil-core to daffodil-schema-compiler
>>     leave daffodil-lib alone
>>     rename daffodil-runtime1 to daffodil-backend-parser-scala
>>     rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>>     rename daffodil-runtime2 to daffodil-backend-generator-c
>>
>>
> 
>

Re: Keep compiled C code or throw it away?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.

There are 3 different kinds of code in Daffodil:

1) static code humans write - compiler code and runtime code, and test rig code. This includes scala, java, TDML, and soon enough, C code.

2) code that is generated that becomes part of daffodil itself. This is generated by code in the daffodil-propgen library and creates src-managed, and resource-managed code and resources in the daffodil-lib.

The above are taken care of by SBT, whether scala, java, or C code.

None of the above has anything to do with a DFDL schema created by a user.

3) The C-code generator creates C code from a user's schema.

I would expect that generator to perhaps lay down not just the C code, but make/build files so the user can build and run their code stand-alone.

But I think of this as 100% separate from daffodil's build.sbt build system. It could use sbt even, but it's not daffodil's build.

The place where things get confusing is that in order to test (3) above, we need to incorporate generating, compiling, linking, and running the generated C code into daffodil's build, for testing purposes.

So I think as a part of daffodil's build, analogous to how daffodil-propgen puts Scala code into daffodil-lib/src-managed/scala/... the C-code generator's src/test/scala code can be used to put C code into daffodil-runtime2/test-managed/C/....

That test-managed/C code would only be for test, but sbt would see it there and compile it almost as if it were hand-written C code.

This would work quite like daffodil-propgen then. Just at test-compile time, not regular compile time.

Does that makes sense?

________________________________
From: Interrante, John A (GE Research, US) <in...@research.ge.com>
Sent: Monday, October 5, 2020 2:11 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Keep compiled C code or throw it away?

The timing of when to compile the C source files that we will be adding to the Daffodil source tree is another topic I would like to discuss on the dev list.   I am using a sbt C compiler plugin in my runtime2 push request to allow Daffodil's sbt build to compile C source files as well as Scala source files.  We would have to include both the libraries built by the C compiler (there would be several, not just one, as Mike pointed out) and some corresponding C header/source files in a Daffodil distribution and/or the output directory of a "daffodil generate C" command.

The current discussion in the pull request is now wavering between:

  1) Build the C libraries and distribute them with daffodil in its daffodil/include and daffodil/lib directories
  2) Build the C libraries, put them along with source files in a jar, and distribute the jar with Daffodil
  3) Put just the C source files in a jar and distribute the jar with Daffodil; the "daffodil generate C" and "daffodil test <.tdml>" commands will snap compile and/or execute the C files

The question comes down to this: what is the best time to build the C source files?

  - Before distribution: This allows us to verify that C source files build and we can test them before we distribute them
  - After distribution: We simplify the sbt build and don't need to build multiple daffodil distributions for different platforms

Are there other choices too?  Actually, I think we need to do BOTH.  We can fix compilation errors quicker if we can build C source files immediately after editing them.  We also need to test the C code by running TDML tests every time we run sbt test or sbt c-generator/test, which implies we need to build the C source files before distribution as well as after distribution.  However, throwing away the C-code libraries during distribution time does mean that we need to compile 50K lines of C code possibly multiple times or cache built C libraries somewhere in order to improve the user's experience.

So the question really is this - do we want to throw away the compiled libraries (".a" files) and distribute only the C source code in platform-independent jars, or distribute compiled machine binary files along with the C source files in or with the platform-independent jars?

-----Original Message-----
From: Steve Lawrence <sl...@apache.org>
Sent: Monday, October 5, 2020 10:49 AM
To: dev@daffodil.apache.org
Subject: EXT: Re: Subproject names proposed for discussion

A handful of unrelated thoughts, maybe overthinking things and I don't feel strongly about anything below, but renaming is always pain so it'd be nice to ensure we have something future proof.

1) Is there any benefit organizationally to having all backends being in the same directory?

2) From a sorting perspective, it'd be nice if the scala projects were together, so having it be scala-parser and scala-unparser rather than parser-scala and unparser-scala has advantages.

3) Maybe the scala parser/unparser should be considered the same "scala"
runtime, and so parser/unparser should be subdirectories of a "daffodil-backend-scala" subdirectory?

4) Is there even a benefit to separating parser/unparser into separate jars? There's so much shared logic between the two, and there's even a bunch of unparsing stuff in the parser jar. Should we just combine them under the same backend?

Taking all of the above into account, perhaps something like this:

...
|-- daffodil-backends
|   |-- daffodil-scala
|   |   `-- src
|   `-- daffodil-generator-c
|       `-- src
|-- daffodil-lib
|   `-- src
|-- daffodil-schema-compiler
|   `-- src
...

5) Is there something better than "backend" for describing these. I can't think of anything. Does the DFDL spec have a concept of this?

6) Are there any benefits to using "codenames". My thinking is maybe someday there could be multiple "scala" backends with different goals/extensions, and so "daffodil-scala" is too generic. Codenames would be more like what we have today, except real code names might be easier to remember than "runtime1" and "runtime2". Disadvantage is there's less discoverability, but a README could be added with short descriptions about what the backends try to accomplish. Not sure I like this, but thought I'd throw it out there.

On 10/5/20 10:23 AM, Beckerle, Mike wrote:
> +1 from me.
>
> ________________________________
> From: Interrante, John A (GE Research, US) <in...@research.ge.com>
> Sent: Monday, October 5, 2020 9:28 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Subproject names proposed for discussion
>
> Steve Lawrence and I would like to bring a topic to the dev list for discussion since not everyone is paying attention to the review of my runtime2 push request.  Steve suggested, and I agree, that renaming some of the Daffodil subprojects might make their meanings more obvious to newcomer devs.  If we do rename some subprojects after discussing it on this list, we will do it immediately in its own pull request since mixing changes with renames makes it difficult to see which changes are just renames instead of actual changes.
>
> What do devs think about us renaming some subprojects like this?
>
>     rename daffodil-core to daffodil-schema-compiler
>     leave daffodil-lib alone
>     rename daffodil-runtime1 to daffodil-backend-parser-scala
>     rename daffodil-runtime1-unparser to daffodil-backend-unparser-scala
>     rename daffodil-runtime2 to daffodil-backend-generator-c
>
>