You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Steve Lawrence <sl...@apache.org> on 2020/10/07 22:17:45 UTC

Re: Daffodil-1300

So why do we recommend initialize the runners in a singleton object and
having custom logic to clean them up when the test finishes? Why not
just put them in the class and let them be cleaned up by the GC when the
test finishes?

On 10/7/20 6:11 PM, Beckerle, Mike wrote:
> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
> 
> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
> 
> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
> 
> 
> 
> 
> 
> 
> 
> 
> ________________________________
> From: Carlson, Ian <ic...@owlcyberdefense.com>
> Sent: Wednesday, October 7, 2020 5:51 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: RE: Daffodil-1300
> 
> 
> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
> 
> 
> 
> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
> 
> 
> 
> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
> 
> 
> 
> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
> 
> 
> 
> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
> 
> [Owl Cyber Defense]
> 
> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
> 
> Connect with us!
> 
> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
> 
> 
> 
> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
> 
> 
> 
> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
> 
> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
> 
> If you have received this transmission in error, please notify the sender immediately
> 
> 
> 
> 
> 
> From: Beckerle, Mike<ma...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 4:03 PM
> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> 
> 
> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
> 
> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
> 
> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
> }
> 
> The deprecation was just to help in hunting down the usages.
> 
> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
> 
> 
> 
> 
> 
> ________________________________
> From: Carlson, Ian <ic...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 4:14 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Daffodil-1300
> 
> 
> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
> 
> 
> 
> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
> 
> @deprecated("2016-12-30", "Use Runner(...) instead."
> 
> 
> 
> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
> 
> 
> 
> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
> 
> 
> 
> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
> 
> 
> 
> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
> 
> [Owl Cyber Defense]
> 
> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
> 
> Connect with us!
> 
> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
> 
> 
> 
> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
> 
> 
> 
> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
> 
> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
> 
> If you have received this transmission in error, please notify the sender immediately
> 
> 
> 
> 
> 
> 


Re: Daffodil-1300

Posted by Steve Lawrence <sl...@apache.org>.
I would argue that implementation="" just means this test isn't known to
work with any implementations. Just like a test that only has
implementation="ibm" means the test doesn't work with daffodil but does
work with IBM. Seems like that could be a reasonable addition if it
doesn't work.

The thing I don't particularly like about something like a "disabled"
attribute is that it's not clear what implementation it's disabled for.

But I think the concept is a great idea. It would be really nice if
there were some official way to provide documentation about why a
particular TDML test doesn't work for different implementations. That
way you can pass around TDML files without the scala files and still
have information about which implementations they are expected to work
with (via implementation tag) and documentation why it doesn't work with
others (via something new).

Having something like this would also make it easy to grep for broken
tests. Right now, you just have to grep through scala  tests files,
which isn't accurate, since many tests might not have a bug number.

It also means we could have a way for the TDML Runner to run all tests
that supposedly don't work with an implementation and see if we've
accidentally fixed them. Right now, to see if old broken tests work you
have to uncomment all the individual tests, which is a pain and why
we've never tried to see if old tests have started working.

Seems like the more information we can have in the TDML files the better.

On 10/9/20 11:46 AM, Beckerle, Mike wrote:
> Right now we also depend on comments like
> 
> // DAFFODIL-1234
> // @Test def test_foo() ...
> 
> As documentation that this test fails due to the issue in that ticket.
> 
> If we change TDML to say implementation="", then we lose the information about
> what implementation it was supposed to work with, which could be more than just daffodil. Of course you can save that info in a comment, but in general this seems clumsier.
> 
> Note: I don't know that implemetation="" is even legal in the TDML runner. You may have to list at least one implementation.
> 
> If we're going to adapt the TDML runner, I'd rather add an explicit attribute for disabling a test e.g., something like
> disabled="DAFFODIL-1234", which if defined, and not empty string, gives a reason the test is not runnable.
> 
> 
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Friday, October 9, 2020 8:06 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> Fair enough, sounds like we can't really get rid of hardcoded scala
> tests if some depend on that ability.
> 
> The run1 is an interesting idea. It solves the problem I had some times
> where I copy an old test and change the function name but not the string
> name. And does avoid boilerplate. Looks like there is a way to get test
> name information from junit, but it involves even more boilerplate.
> Though, I don't think stack trace is awful
> 
> I think it's just a one liner to get the function that called runOne
> 
>   def runOne: Unit = {
>     val testName = Thread.currentThread.getStackTrace()(2).getMethodName
>     runOneTest(testName)
>   }
> 
> Not sure if there's a performance impact, I believe getStackTrace is
> relatively slow. But I imagine it's nothing compared to what an
> individual test does.
> 
> Another option might be via scala macros, but that wouldn't help anyone
> running from java, and I'm not even positive if it could be made to work.
> 
> As far Brandon's generator suggestion, one issue I see is how to
> distribute that so that it's easily runnable. Ideally this same process
> would be used for DFDL schema repos as well as Daffodil. Presumably it
> would by a class with a Main in daffodil-tdml-lib, and users would need
> to run it via the command line after the dependency is downloaded? Seems
> unintutive for new users. Perhaps an DFDL Schema project sbt plugin is a
> way to automate that? Also depends on a certain project layout, which
> I'm not sure many projects follow.
> 
> I think using implemtation="" would be a good way to define which tests
> work and which don't. Then you don't even need to comment out tests.
> Those tests just get skipped when run.
> 
> On 10/8/20 2:02 PM, Beckerle, Mike wrote:
>> I've depended, since the earliest days of Daffodil, on being able to click on one of the
>>
>>     def test_foo() = { Runner.runOneTest("foo") }
>>
>> And do "debug test" and have it run that single test in the breakpoint debugger.
>> I do this dozens of times a day when working on Daffodil using both eclipse, and using Intellij IDEA.  This is all leveraging IDE support for JUnit.
>>
>> So whatever we do, that has to remain very convenient.
>>
>> The redundancy of having the name of the test twice is annoying.  This would be better:
>>
>>    @Test def foo() = r.run1
>>
>> Where run1 with no arguments instructs it to derive the test name from the method name. That would require naming conventions for tests to be scala identifier-friendly, or mappable.
>>
>> That's as small as I can make it knowing it would remain JUnit friendly.
>>
>> I'm not sure the method name is available to a method by means other than using debugger/backtrace kinds of tricks.
>>
>>
>> ________________________________
>> From: Sloane, Brandon <bs...@owlcyberdefense.com>
>> Sent: Thursday, October 8, 2020 12:30 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>> I've used dynamic tests in a couple of schema projects I was working on.
>>
>> My recollection is that the IDE integration made it difficult to run a single test, as you first need to generate all the tests; and there was not a good way to cache the results.
>>
>> In the case of Daffodil, we also take advantage of the fact that we can comment out specifc tests. I don't think dynamic tests have a good way of doing that (although we can probably implement a test blacklist ourselves).
>>
>> Another approach I have taken that seems to work better is to have a code generator produce the boilerplate test classes. We still need to figure out how to persistently disable tests, but this should avoid IDE integration issues.
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Thursday, October 8, 2020 11:28 AM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>> Skimming the Runner code, I'm not entirely convinced this is all working
>> as expected. But I think Ian is looking into that as part of this bug.
>> Seems like a reasonable time to confirm the complexity of our TDML
>> runner and all the caching is actually working right before updating all
>> the tests to use Runners.
>>
>> That said, I'm wondering if maybe there is a better approach to our
>> testing infrastructure. I wonder if all of our boilerplate like this
>> could be removed:
>>
>>   @Test test_foo = { runner.runOneTest("foo") }
>>
>> This is all pretty much boilerplate.
>>
>> It seems like it should be possible to have some sort of JUnit TestSuite
>> factory that simply scans for all the .tdml files, creates a unique
>> Runner for each tdml file (thus avoiding any duplication issues) and
>> then dynamically creates tests for all the parser/unparserTestCases. All
>> this scala stuff feels unnecessary to me.
>>
>> Optional provided to the Runner (e.g. validation) can become attributes
>> on the tdml:testSuite attribute.
>>
>> And we already have the "implementation" attribute that says which
>> implementations a tdml:parser/unparserTestCase works with. So rather
>> than commenting out tests in the scala file, we can just set
>> implementation="" to state it doesn't work with daffodil.
>>
>> Though, I'm not sure if JUnit actually has a concept of dynamic test
>> suites (it does have dynamic tests), and how nicely that plays with
>> IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
>> maybe preferred, I would be willing to switch to something else if it
>> has this sort of capability and works with IDEs.
>>
>>
>>
>> On 10/8/20 10:59 AM, Beckerle, Mike wrote:
>>> The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.
>>>
>>> It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.
>>>
>>>   *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
>>>      *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
>>>   *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
>>>   *   JUnit is the preferred idiom, not scalatest
>>>   *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
>>>      *   tdml file name and path
>>>      *   Options such as whether to validate the TDML or not, schema or not
>>>      *   for each junit test, the name of the test that appears in the TDML file.
>>>
>>>   *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
>>>      *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
>>>   *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.
>>>
>>> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.
>>>
>>> The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.
>>>
>>> I'm not at all sure this is the case.
>>>
>>> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>>>
>>> There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.
>>>
>>> The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Steve Lawrence <sl...@apache.org>
>>> Sent: Wednesday, October 7, 2020 6:17 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Re: Daffodil-1300
>>>
>>> So why do we recommend initialize the runners in a singleton object and
>>> having custom logic to clean them up when the test finishes? Why not
>>> just put them in the class and let them be cleaned up by the GC when the
>>> test finishes?
>>>
>>> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>>>> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>>>>
>>>> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>>>
>>>> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>>>> Sent: Wednesday, October 7, 2020 5:51 PM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: RE: Daffodil-1300
>>>>
>>>>
>>>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>>>>
>>>>
>>>>
>>>> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>>>>
>>>>
>>>>
>>>> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>>>>
>>>>
>>>>
>>>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>>>>
>>>>
>>>>
>>>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>>>
>>>> [Owl Cyber Defense]
>>>>
>>>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>>
>>>> Connect with us!
>>>>
>>>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>>>
>>>>
>>>>
>>>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>>
>>>>
>>>>
>>>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>>>
>>>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>>>
>>>> If you have received this transmission in error, please notify the sender immediately
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> From: Beckerle, Mike<ma...@owlcyberdefense.com>
>>>> Sent: Thursday, October 1, 2020 4:03 PM
>>>> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
>>>> Subject: Re: Daffodil-1300
>>>>
>>>>
>>>>
>>>> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>>>>
>>>> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>>>>
>>>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>>>> }
>>>>
>>>> The deprecation was just to help in hunting down the usages.
>>>>
>>>> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________
>>>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>>>> Sent: Thursday, October 1, 2020 4:14 PM
>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> Subject: Daffodil-1300
>>>>
>>>>
>>>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>>>>
>>>>
>>>>
>>>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>>>>
>>>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>>>
>>>>
>>>>
>>>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>>>>
>>>>
>>>>
>>>> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>>>>
>>>>
>>>>
>>>> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>>>>
>>>>
>>>>
>>>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>>>
>>>> [Owl Cyber Defense]
>>>>
>>>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>>
>>>> Connect with us!
>>>>
>>>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>>>
>>>>
>>>>
>>>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>>
>>>>
>>>>
>>>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>>>
>>>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>>>
>>>> If you have received this transmission in error, please notify the sender immediately
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 
> 


Re: Daffodil-1300

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
Right now we also depend on comments like

// DAFFODIL-1234
// @Test def test_foo() ...

As documentation that this test fails due to the issue in that ticket.

If we change TDML to say implementation="", then we lose the information about
what implementation it was supposed to work with, which could be more than just daffodil. Of course you can save that info in a comment, but in general this seems clumsier.

Note: I don't know that implemetation="" is even legal in the TDML runner. You may have to list at least one implementation.

If we're going to adapt the TDML runner, I'd rather add an explicit attribute for disabling a test e.g., something like
disabled="DAFFODIL-1234", which if defined, and not empty string, gives a reason the test is not runnable.




________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Friday, October 9, 2020 8:06 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Daffodil-1300

Fair enough, sounds like we can't really get rid of hardcoded scala
tests if some depend on that ability.

The run1 is an interesting idea. It solves the problem I had some times
where I copy an old test and change the function name but not the string
name. And does avoid boilerplate. Looks like there is a way to get test
name information from junit, but it involves even more boilerplate.
Though, I don't think stack trace is awful

I think it's just a one liner to get the function that called runOne

  def runOne: Unit = {
    val testName = Thread.currentThread.getStackTrace()(2).getMethodName
    runOneTest(testName)
  }

Not sure if there's a performance impact, I believe getStackTrace is
relatively slow. But I imagine it's nothing compared to what an
individual test does.

Another option might be via scala macros, but that wouldn't help anyone
running from java, and I'm not even positive if it could be made to work.

As far Brandon's generator suggestion, one issue I see is how to
distribute that so that it's easily runnable. Ideally this same process
would be used for DFDL schema repos as well as Daffodil. Presumably it
would by a class with a Main in daffodil-tdml-lib, and users would need
to run it via the command line after the dependency is downloaded? Seems
unintutive for new users. Perhaps an DFDL Schema project sbt plugin is a
way to automate that? Also depends on a certain project layout, which
I'm not sure many projects follow.

I think using implemtation="" would be a good way to define which tests
work and which don't. Then you don't even need to comment out tests.
Those tests just get skipped when run.

On 10/8/20 2:02 PM, Beckerle, Mike wrote:
> I've depended, since the earliest days of Daffodil, on being able to click on one of the
>
>     def test_foo() = { Runner.runOneTest("foo") }
>
> And do "debug test" and have it run that single test in the breakpoint debugger.
> I do this dozens of times a day when working on Daffodil using both eclipse, and using Intellij IDEA.  This is all leveraging IDE support for JUnit.
>
> So whatever we do, that has to remain very convenient.
>
> The redundancy of having the name of the test twice is annoying.  This would be better:
>
>    @Test def foo() = r.run1
>
> Where run1 with no arguments instructs it to derive the test name from the method name. That would require naming conventions for tests to be scala identifier-friendly, or mappable.
>
> That's as small as I can make it knowing it would remain JUnit friendly.
>
> I'm not sure the method name is available to a method by means other than using debugger/backtrace kinds of tricks.
>
>
> ________________________________
> From: Sloane, Brandon <bs...@owlcyberdefense.com>
> Sent: Thursday, October 8, 2020 12:30 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
> I've used dynamic tests in a couple of schema projects I was working on.
>
> My recollection is that the IDE integration made it difficult to run a single test, as you first need to generate all the tests; and there was not a good way to cache the results.
>
> In the case of Daffodil, we also take advantage of the fact that we can comment out specifc tests. I don't think dynamic tests have a good way of doing that (although we can probably implement a test blacklist ourselves).
>
> Another approach I have taken that seems to work better is to have a code generator produce the boilerplate test classes. We still need to figure out how to persistently disable tests, but this should avoid IDE integration issues.
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, October 8, 2020 11:28 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
> Skimming the Runner code, I'm not entirely convinced this is all working
> as expected. But I think Ian is looking into that as part of this bug.
> Seems like a reasonable time to confirm the complexity of our TDML
> runner and all the caching is actually working right before updating all
> the tests to use Runners.
>
> That said, I'm wondering if maybe there is a better approach to our
> testing infrastructure. I wonder if all of our boilerplate like this
> could be removed:
>
>   @Test test_foo = { runner.runOneTest("foo") }
>
> This is all pretty much boilerplate.
>
> It seems like it should be possible to have some sort of JUnit TestSuite
> factory that simply scans for all the .tdml files, creates a unique
> Runner for each tdml file (thus avoiding any duplication issues) and
> then dynamically creates tests for all the parser/unparserTestCases. All
> this scala stuff feels unnecessary to me.
>
> Optional provided to the Runner (e.g. validation) can become attributes
> on the tdml:testSuite attribute.
>
> And we already have the "implementation" attribute that says which
> implementations a tdml:parser/unparserTestCase works with. So rather
> than commenting out tests in the scala file, we can just set
> implementation="" to state it doesn't work with daffodil.
>
> Though, I'm not sure if JUnit actually has a concept of dynamic test
> suites (it does have dynamic tests), and how nicely that plays with
> IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
> maybe preferred, I would be willing to switch to something else if it
> has this sort of capability and works with IDEs.
>
>
>
> On 10/8/20 10:59 AM, Beckerle, Mike wrote:
>> The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.
>>
>> It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.
>>
>>   *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
>>      *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
>>   *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
>>   *   JUnit is the preferred idiom, not scalatest
>>   *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
>>      *   tdml file name and path
>>      *   Options such as whether to validate the TDML or not, schema or not
>>      *   for each junit test, the name of the test that appears in the TDML file.
>>
>>   *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
>>      *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
>>   *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.
>>
>> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.
>>
>> The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.
>>
>> I'm not at all sure this is the case.
>>
>> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>>
>> There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.
>>
>> The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.
>>
>>
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Wednesday, October 7, 2020 6:17 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>> So why do we recommend initialize the runners in a singleton object and
>> having custom logic to clean them up when the test finishes? Why not
>> just put them in the class and let them be cleaned up by the GC when the
>> test finishes?
>>
>> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>>> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>>>
>>> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>>
>>> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>>> Sent: Wednesday, October 7, 2020 5:51 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: RE: Daffodil-1300
>>>
>>>
>>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>>>
>>>
>>>
>>> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>>>
>>>
>>>
>>> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>>>
>>>
>>>
>>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>>>
>>>
>>>
>>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>>
>>> [Owl Cyber Defense]
>>>
>>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>
>>> Connect with us!
>>>
>>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>>
>>>
>>>
>>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>
>>>
>>>
>>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>>
>>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>>
>>> If you have received this transmission in error, please notify the sender immediately
>>>
>>>
>>>
>>>
>>>
>>> From: Beckerle, Mike<ma...@owlcyberdefense.com>
>>> Sent: Thursday, October 1, 2020 4:03 PM
>>> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
>>> Subject: Re: Daffodil-1300
>>>
>>>
>>>
>>> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>>>
>>> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>>>
>>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>>> }
>>>
>>> The deprecation was just to help in hunting down the usages.
>>>
>>> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>>> Sent: Thursday, October 1, 2020 4:14 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Daffodil-1300
>>>
>>>
>>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>>>
>>>
>>>
>>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>>>
>>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>>
>>>
>>>
>>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>>>
>>>
>>>
>>> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>>>
>>>
>>>
>>> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>>>
>>>
>>>
>>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>>
>>> [Owl Cyber Defense]
>>>
>>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>
>>> Connect with us!
>>>
>>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>>
>>>
>>>
>>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>
>>>
>>>
>>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>>
>>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>>
>>> If you have received this transmission in error, please notify the sender immediately
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>


Re: Daffodil-1300

Posted by Steve Lawrence <sl...@apache.org>.
Fair enough, sounds like we can't really get rid of hardcoded scala
tests if some depend on that ability.

The run1 is an interesting idea. It solves the problem I had some times
where I copy an old test and change the function name but not the string
name. And does avoid boilerplate. Looks like there is a way to get test
name information from junit, but it involves even more boilerplate.
Though, I don't think stack trace is awful

I think it's just a one liner to get the function that called runOne

  def runOne: Unit = {
    val testName = Thread.currentThread.getStackTrace()(2).getMethodName
    runOneTest(testName)
  }

Not sure if there's a performance impact, I believe getStackTrace is
relatively slow. But I imagine it's nothing compared to what an
individual test does.

Another option might be via scala macros, but that wouldn't help anyone
running from java, and I'm not even positive if it could be made to work.

As far Brandon's generator suggestion, one issue I see is how to
distribute that so that it's easily runnable. Ideally this same process
would be used for DFDL schema repos as well as Daffodil. Presumably it
would by a class with a Main in daffodil-tdml-lib, and users would need
to run it via the command line after the dependency is downloaded? Seems
unintutive for new users. Perhaps an DFDL Schema project sbt plugin is a
way to automate that? Also depends on a certain project layout, which
I'm not sure many projects follow.

I think using implemtation="" would be a good way to define which tests
work and which don't. Then you don't even need to comment out tests.
Those tests just get skipped when run.

On 10/8/20 2:02 PM, Beckerle, Mike wrote:
> I've depended, since the earliest days of Daffodil, on being able to click on one of the
> 
>     def test_foo() = { Runner.runOneTest("foo") }
> 
> And do "debug test" and have it run that single test in the breakpoint debugger.
> I do this dozens of times a day when working on Daffodil using both eclipse, and using Intellij IDEA.  This is all leveraging IDE support for JUnit.
> 
> So whatever we do, that has to remain very convenient.
> 
> The redundancy of having the name of the test twice is annoying.  This would be better:
> 
>    @Test def foo() = r.run1
> 
> Where run1 with no arguments instructs it to derive the test name from the method name. That would require naming conventions for tests to be scala identifier-friendly, or mappable.
> 
> That's as small as I can make it knowing it would remain JUnit friendly.
> 
> I'm not sure the method name is available to a method by means other than using debugger/backtrace kinds of tricks.
> 
> 
> ________________________________
> From: Sloane, Brandon <bs...@owlcyberdefense.com>
> Sent: Thursday, October 8, 2020 12:30 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> I've used dynamic tests in a couple of schema projects I was working on.
> 
> My recollection is that the IDE integration made it difficult to run a single test, as you first need to generate all the tests; and there was not a good way to cache the results.
> 
> In the case of Daffodil, we also take advantage of the fact that we can comment out specifc tests. I don't think dynamic tests have a good way of doing that (although we can probably implement a test blacklist ourselves).
> 
> Another approach I have taken that seems to work better is to have a code generator produce the boilerplate test classes. We still need to figure out how to persistently disable tests, but this should avoid IDE integration issues.
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, October 8, 2020 11:28 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> Skimming the Runner code, I'm not entirely convinced this is all working
> as expected. But I think Ian is looking into that as part of this bug.
> Seems like a reasonable time to confirm the complexity of our TDML
> runner and all the caching is actually working right before updating all
> the tests to use Runners.
> 
> That said, I'm wondering if maybe there is a better approach to our
> testing infrastructure. I wonder if all of our boilerplate like this
> could be removed:
> 
>   @Test test_foo = { runner.runOneTest("foo") }
> 
> This is all pretty much boilerplate.
> 
> It seems like it should be possible to have some sort of JUnit TestSuite
> factory that simply scans for all the .tdml files, creates a unique
> Runner for each tdml file (thus avoiding any duplication issues) and
> then dynamically creates tests for all the parser/unparserTestCases. All
> this scala stuff feels unnecessary to me.
> 
> Optional provided to the Runner (e.g. validation) can become attributes
> on the tdml:testSuite attribute.
> 
> And we already have the "implementation" attribute that says which
> implementations a tdml:parser/unparserTestCase works with. So rather
> than commenting out tests in the scala file, we can just set
> implementation="" to state it doesn't work with daffodil.
> 
> Though, I'm not sure if JUnit actually has a concept of dynamic test
> suites (it does have dynamic tests), and how nicely that plays with
> IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
> maybe preferred, I would be willing to switch to something else if it
> has this sort of capability and works with IDEs.
> 
> 
> 
> On 10/8/20 10:59 AM, Beckerle, Mike wrote:
>> The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.
>>
>> It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.
>>
>>   *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
>>      *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
>>   *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
>>   *   JUnit is the preferred idiom, not scalatest
>>   *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
>>      *   tdml file name and path
>>      *   Options such as whether to validate the TDML or not, schema or not
>>      *   for each junit test, the name of the test that appears in the TDML file.
>>
>>   *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
>>      *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
>>   *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.
>>
>> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.
>>
>> The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.
>>
>> I'm not at all sure this is the case.
>>
>> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>>
>> There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.
>>
>> The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.
>>
>>
>>
>>
>> ________________________________
>> From: Steve Lawrence <sl...@apache.org>
>> Sent: Wednesday, October 7, 2020 6:17 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>> So why do we recommend initialize the runners in a singleton object and
>> having custom logic to clean them up when the test finishes? Why not
>> just put them in the class and let them be cleaned up by the GC when the
>> test finishes?
>>
>> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>>> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>>>
>>> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>>
>>> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>>> Sent: Wednesday, October 7, 2020 5:51 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: RE: Daffodil-1300
>>>
>>>
>>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>>>
>>>
>>>
>>> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>>>
>>>
>>>
>>> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>>>
>>>
>>>
>>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>>>
>>>
>>>
>>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>>
>>> [Owl Cyber Defense]
>>>
>>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>
>>> Connect with us!
>>>
>>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>>
>>>
>>>
>>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>
>>>
>>>
>>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>>
>>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>>
>>> If you have received this transmission in error, please notify the sender immediately
>>>
>>>
>>>
>>>
>>>
>>> From: Beckerle, Mike<ma...@owlcyberdefense.com>
>>> Sent: Thursday, October 1, 2020 4:03 PM
>>> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
>>> Subject: Re: Daffodil-1300
>>>
>>>
>>>
>>> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>>>
>>> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>>>
>>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>>> }
>>>
>>> The deprecation was just to help in hunting down the usages.
>>>
>>> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>>> Sent: Thursday, October 1, 2020 4:14 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: Daffodil-1300
>>>
>>>
>>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>>>
>>>
>>>
>>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>>>
>>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>>
>>>
>>>
>>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>>>
>>>
>>>
>>> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>>>
>>>
>>>
>>> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>>>
>>>
>>>
>>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>>
>>> [Owl Cyber Defense]
>>>
>>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>>
>>> Connect with us!
>>>
>>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>>
>>>
>>>
>>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>>
>>>
>>>
>>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>>
>>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>>
>>> If you have received this transmission in error, please notify the sender immediately
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
> 
> 


Re: Daffodil-1300

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
I've depended, since the earliest days of Daffodil, on being able to click on one of the

    def test_foo() = { Runner.runOneTest("foo") }

And do "debug test" and have it run that single test in the breakpoint debugger.
I do this dozens of times a day when working on Daffodil using both eclipse, and using Intellij IDEA.  This is all leveraging IDE support for JUnit.

So whatever we do, that has to remain very convenient.

The redundancy of having the name of the test twice is annoying.  This would be better:

   @Test def foo() = r.run1

Where run1 with no arguments instructs it to derive the test name from the method name. That would require naming conventions for tests to be scala identifier-friendly, or mappable.

That's as small as I can make it knowing it would remain JUnit friendly.

I'm not sure the method name is available to a method by means other than using debugger/backtrace kinds of tricks.


________________________________
From: Sloane, Brandon <bs...@owlcyberdefense.com>
Sent: Thursday, October 8, 2020 12:30 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Daffodil-1300

I've used dynamic tests in a couple of schema projects I was working on.

My recollection is that the IDE integration made it difficult to run a single test, as you first need to generate all the tests; and there was not a good way to cache the results.

In the case of Daffodil, we also take advantage of the fact that we can comment out specifc tests. I don't think dynamic tests have a good way of doing that (although we can probably implement a test blacklist ourselves).

Another approach I have taken that seems to work better is to have a code generator produce the boilerplate test classes. We still need to figure out how to persistently disable tests, but this should avoid IDE integration issues.
________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, October 8, 2020 11:28 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Daffodil-1300

Skimming the Runner code, I'm not entirely convinced this is all working
as expected. But I think Ian is looking into that as part of this bug.
Seems like a reasonable time to confirm the complexity of our TDML
runner and all the caching is actually working right before updating all
the tests to use Runners.

That said, I'm wondering if maybe there is a better approach to our
testing infrastructure. I wonder if all of our boilerplate like this
could be removed:

  @Test test_foo = { runner.runOneTest("foo") }

This is all pretty much boilerplate.

It seems like it should be possible to have some sort of JUnit TestSuite
factory that simply scans for all the .tdml files, creates a unique
Runner for each tdml file (thus avoiding any duplication issues) and
then dynamically creates tests for all the parser/unparserTestCases. All
this scala stuff feels unnecessary to me.

Optional provided to the Runner (e.g. validation) can become attributes
on the tdml:testSuite attribute.

And we already have the "implementation" attribute that says which
implementations a tdml:parser/unparserTestCase works with. So rather
than commenting out tests in the scala file, we can just set
implementation="" to state it doesn't work with daffodil.

Though, I'm not sure if JUnit actually has a concept of dynamic test
suites (it does have dynamic tests), and how nicely that plays with
IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
maybe preferred, I would be willing to switch to something else if it
has this sort of capability and works with IDEs.



On 10/8/20 10:59 AM, Beckerle, Mike wrote:
> The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.
>
> It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.
>
>   *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
>      *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
>   *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
>   *   JUnit is the preferred idiom, not scalatest
>   *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
>      *   tdml file name and path
>      *   Options such as whether to validate the TDML or not, schema or not
>      *   for each junit test, the name of the test that appears in the TDML file.
>
>   *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
>      *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
>   *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.
>
> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.
>
> The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.
>
> I'm not at all sure this is the case.
>
> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>
> There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.
>
> The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.
>
>
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Wednesday, October 7, 2020 6:17 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
> So why do we recommend initialize the runners in a singleton object and
> having custom logic to clean them up when the test finishes? Why not
> just put them in the class and let them be cleaned up by the GC when the
> test finishes?
>
> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>>
>> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>
>> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>> Sent: Wednesday, October 7, 2020 5:51 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: RE: Daffodil-1300
>>
>>
>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>>
>>
>>
>> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>>
>>
>>
>> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>>
>>
>>
>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>
>> If you have received this transmission in error, please notify the sender immediately
>>
>>
>>
>>
>>
>> From: Beckerle, Mike<ma...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:03 PM
>> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>>
>>
>> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>>
>> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>>
>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>> }
>>
>> The deprecation was just to help in hunting down the usages.
>>
>> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:14 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Daffodil-1300
>>
>>
>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>>
>>
>>
>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>>
>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>
>>
>>
>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>>
>>
>>
>> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>>
>>
>>
>> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>
>> If you have received this transmission in error, please notify the sender immediately
>>
>>
>>
>>
>>
>>
>
>


Re: Daffodil-1300

Posted by "Sloane, Brandon" <bs...@owlcyberdefense.com>.
I've used dynamic tests in a couple of schema projects I was working on.

My recollection is that the IDE integration made it difficult to run a single test, as you first need to generate all the tests; and there was not a good way to cache the results.

In the case of Daffodil, we also take advantage of the fact that we can comment out specifc tests. I don't think dynamic tests have a good way of doing that (although we can probably implement a test blacklist ourselves).

Another approach I have taken that seems to work better is to have a code generator produce the boilerplate test classes. We still need to figure out how to persistently disable tests, but this should avoid IDE integration issues.
________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, October 8, 2020 11:28 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Daffodil-1300

Skimming the Runner code, I'm not entirely convinced this is all working
as expected. But I think Ian is looking into that as part of this bug.
Seems like a reasonable time to confirm the complexity of our TDML
runner and all the caching is actually working right before updating all
the tests to use Runners.

That said, I'm wondering if maybe there is a better approach to our
testing infrastructure. I wonder if all of our boilerplate like this
could be removed:

  @Test test_foo = { runner.runOneTest("foo") }

This is all pretty much boilerplate.

It seems like it should be possible to have some sort of JUnit TestSuite
factory that simply scans for all the .tdml files, creates a unique
Runner for each tdml file (thus avoiding any duplication issues) and
then dynamically creates tests for all the parser/unparserTestCases. All
this scala stuff feels unnecessary to me.

Optional provided to the Runner (e.g. validation) can become attributes
on the tdml:testSuite attribute.

And we already have the "implementation" attribute that says which
implementations a tdml:parser/unparserTestCase works with. So rather
than commenting out tests in the scala file, we can just set
implementation="" to state it doesn't work with daffodil.

Though, I'm not sure if JUnit actually has a concept of dynamic test
suites (it does have dynamic tests), and how nicely that plays with
IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
maybe preferred, I would be willing to switch to something else if it
has this sort of capability and works with IDEs.



On 10/8/20 10:59 AM, Beckerle, Mike wrote:
> The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.
>
> It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.
>
>   *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
>      *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
>   *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
>   *   JUnit is the preferred idiom, not scalatest
>   *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
>      *   tdml file name and path
>      *   Options such as whether to validate the TDML or not, schema or not
>      *   for each junit test, the name of the test that appears in the TDML file.
>
>   *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
>      *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
>   *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.
>
> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.
>
> The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.
>
> I'm not at all sure this is the case.
>
> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
>
> There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.
>
> The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.
>
>
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Wednesday, October 7, 2020 6:17 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
> So why do we recommend initialize the runners in a singleton object and
> having custom logic to clean them up when the test finishes? Why not
> just put them in the class and let them be cleaned up by the GC when the
> test finishes?
>
> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>>
>> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>
>> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>> Sent: Wednesday, October 7, 2020 5:51 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: RE: Daffodil-1300
>>
>>
>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>>
>>
>>
>> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>>
>>
>>
>> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>>
>>
>>
>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>
>> If you have received this transmission in error, please notify the sender immediately
>>
>>
>>
>>
>>
>> From: Beckerle, Mike<ma...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:03 PM
>> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>>
>>
>> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>>
>> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>>
>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>> }
>>
>> The deprecation was just to help in hunting down the usages.
>>
>> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:14 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Daffodil-1300
>>
>>
>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>>
>>
>>
>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>>
>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>
>>
>>
>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>>
>>
>>
>> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>>
>>
>>
>> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>
>> If you have received this transmission in error, please notify the sender immediately
>>
>>
>>
>>
>>
>>
>
>


Re: Daffodil-1300

Posted by Steve Lawrence <sl...@apache.org>.
Skimming the Runner code, I'm not entirely convinced this is all working
as expected. But I think Ian is looking into that as part of this bug.
Seems like a reasonable time to confirm the complexity of our TDML
runner and all the caching is actually working right before updating all
the tests to use Runners.

That said, I'm wondering if maybe there is a better approach to our
testing infrastructure. I wonder if all of our boilerplate like this
could be removed:

  @Test test_foo = { runner.runOneTest("foo") }

This is all pretty much boilerplate.

It seems like it should be possible to have some sort of JUnit TestSuite
factory that simply scans for all the .tdml files, creates a unique
Runner for each tdml file (thus avoiding any duplication issues) and
then dynamically creates tests for all the parser/unparserTestCases. All
this scala stuff feels unnecessary to me.

Optional provided to the Runner (e.g. validation) can become attributes
on the tdml:testSuite attribute.

And we already have the "implementation" attribute that says which
implementations a tdml:parser/unparserTestCase works with. So rather
than commenting out tests in the scala file, we can just set
implementation="" to state it doesn't work with daffodil.

Though, I'm not sure if JUnit actually has a concept of dynamic test
suites (it does have dynamic tests), and how nicely that plays with
IDEs. I feel like we also shouldn't necessarily be tied to JUnit. While
maybe preferred, I would be willing to switch to something else if it
has this sort of capability and works with IDEs.



On 10/8/20 10:59 AM, Beckerle, Mike wrote:
> The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.
> 
> It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.
> 
>   *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
>      *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
>   *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
>   *   JUnit is the preferred idiom, not scalatest
>   *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
>      *   tdml file name and path
>      *   Options such as whether to validate the TDML or not, schema or not
>      *   for each junit test, the name of the test that appears in the TDML file.
> 
>   *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
>      *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
>   *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.
> 
> Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.
> 
> The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.
> 
> I'm not at all sure this is the case.
> 
> There are multiple, possibly overlapping, ad-hoc fixes to these issues.
> 
> There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.
> 
> The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.
> 
> 
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Wednesday, October 7, 2020 6:17 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
> 
> So why do we recommend initialize the runners in a singleton object and
> having custom logic to clean them up when the test finishes? Why not
> just put them in the class and let them be cleaned up by the GC when the
> test finishes?
> 
> On 10/7/20 6:11 PM, Beckerle, Mike wrote:
>> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>>
>> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>>
>> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>>
>>
>>
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>> Sent: Wednesday, October 7, 2020 5:51 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: RE: Daffodil-1300
>>
>>
>> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>>
>>
>>
>> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>>
>>
>>
>> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>>
>>
>>
>> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>
>> If you have received this transmission in error, please notify the sender immediately
>>
>>
>>
>>
>>
>> From: Beckerle, Mike<ma...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:03 PM
>> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
>> Subject: Re: Daffodil-1300
>>
>>
>>
>> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>>
>> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>>
>> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
>> }
>>
>> The deprecation was just to help in hunting down the usages.
>>
>> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>>
>>
>>
>>
>>
>> ________________________________
>> From: Carlson, Ian <ic...@owlcyberdefense.com>
>> Sent: Thursday, October 1, 2020 4:14 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Daffodil-1300
>>
>>
>> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>>
>>
>>
>> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>>
>> @deprecated("2016-12-30", "Use Runner(...) instead."
>>
>>
>>
>> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>>
>>
>>
>> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>>
>>
>>
>> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>>
>>
>>
>> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>>
>> [Owl Cyber Defense]
>>
>> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>>
>> Connect with us!
>>
>> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>>
>>
>>
>> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>>
>>
>>
>> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>>
>> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>>
>> If you have received this transmission in error, please notify the sender immediately
>>
>>
>>
>>
>>
>>
> 
> 


Re: Daffodil-1300

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
The point of runners and the object constructing them, was to share the XML-loading of the TDML, and computation of all the test suite objects from it, which was otherwise being loaded repeatedly and computed repeatedly.

It is sensible to question this. There's a bunch of requirements we may have lost sight of, so let me see if I can review some of them here.

  *    Some TDML files are *huge* and it is quite slow if the TDML gets loaded (as XML) multiple times. Clearly a TDML file can contain multiple tests and the work to load, parse, compile schemas in, and prepare for tests should happen exactly once per TDML file, regardless of how test drivers are used to invoke the individual tests.
     *   Decomposing these into smaller files is desirable, but shouldn't be necessary.
  *   Test drivers (the scala junit calls) that enable easy testing of a single test of a TDML file are essential to easy debugging. They must work from sbt and from IDEs
  *   JUnit is the preferred idiom, not scalatest
  *   Boilerplate in the scala test driver code is to be minimized. Nothing that doesn't have to be expressed should be expressed, management of objects and their lifetimes shouldn't be required. (This argues that the shutdown code shouldn't be required) The only things that should have to be specified are:
     *   tdml file name and path
     *   Options such as whether to validate the TDML or not, schema or not
     *   for each junit test, the name of the test that appears in the TDML file.

  *   Tests are generally run on multiple threads. Not just whole TDML suites at a time, but even within one TDML file, many Junit test drivers get kicked off simultaneously on multiple threads. This parallelism is needed to make the test suite run in a reasonable amount of time.
     *   This cannot construct new copies of the entire DFDLTestSuite or the overhead would be too high.
  *   The volume of tests and duration of the runs requires that test data structures such as the DFDLTestSuite objects be incrementally garbage collected.

Requirements on the TDML runner (DFDLTestSuite and its methods) to behave properly and be coded properly may not be properly articulated.

The DFDLTestSuite, and any objects it creates such as ParserTestCase and UnparserTestCase, should have state members only for digesting the definition of the test suite and test definitions. (i.e., should be all vals, or lazy vals, not vars, and they should not compute anything that comes from running the test, only from the test definitions.) They should be stateless otherwise.

I'm not at all sure this is the case.

There are multiple, possibly overlapping, ad-hoc fixes to these issues.

There's a schema cache which tries to avoid recompiling the same DFDL schema over and over for various tests defined in the same TDML file. This seems to be needed because tests may use the same DFDL schema and same root element, but this isn't apparent except by examining the test case specifically. So some sort of cache is needed, as the same schema with different config options may be used.

The Runner object is effectively trying to be a cache of the DFDLTestSuite object so that these are not constructed multiple times.




________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Wednesday, October 7, 2020 6:17 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: Daffodil-1300

So why do we recommend initialize the runners in a singleton object and
having custom logic to clean them up when the test finishes? Why not
just put them in the class and let them be cleaned up by the GC when the
test finishes?

On 10/7/20 6:11 PM, Beckerle, Mike wrote:
> The Runner objects are normally members of a scala singleton object. That way they are constructed once and shared by all the tests.
>
> My understanding is once a singleton object is required, it is constructed, and then never goes away for the life of the JVM (actually the class loader, but those aren't released typically either, so life of the JVM) (per https://stackoverflow.com/questions/3956652/when-are-scala-objects-garbage-collected)
>
> So as Runners are usually val or lazy val of these singletons, they will never go out of scope.
>
>
>
>
>
>
>
>
> ________________________________
> From: Carlson, Ian <ic...@owlcyberdefense.com>
> Sent: Wednesday, October 7, 2020 5:51 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: RE: Daffodil-1300
>
>
> A conversation with Steve Lawrence, and digging deeper into the ThreadLocal aspect of the Runner and DFDLTestSuite objects has called into question whether the shutdown routines are still needed or not. My understanding is that Scala is a garbage collected language, and should dump objects and classes which are no longer in use – so long as references to them are not being held by still-existing objects.
>
>
>
> The shutdown routine called by the Runner class simply invokes ts_ts.set(null), where tl_ts is of type ThreadLocal[DFDLTestSuite]. Presumably this removes the reference to the DFDLTestSuite and allows it to be garbage collected, but in current cases, the Runner  containing the DFDLTestSuite will simply go out of scope once the block of tests is completed.
>
>
>
> The Runner objects are constructed as lazy val types, so it is my expectation that they will be constructed only when used – as in the TresysTests.scala file – lines 108-181 – and then be released at the end of that scope.
>
>
>
> So is the invocation of .set(null) on the ThreadLocal[DFDLTestSuite] objects still actually necessary?
>
>
>
> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>
> [Owl Cyber Defense]
>
> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>
> Connect with us!
>
> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>
>
>
> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>
>
>
> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>
> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>
> If you have received this transmission in error, please notify the sender immediately
>
>
>
>
>
> From: Beckerle, Mike<ma...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 4:03 PM
> To: dev@daffodil.apache.org<ma...@daffodil.apache.org>
> Subject: Re: Daffodil-1300
>
>
>
> I think the DFDLTestSuite constructor can't stay deprecated. We still need to construct them.
>
> It can become package private I think, so not usable outside the package.  I think the scala syntax for that is:
>
> class Foo private[pkgName] (arg1: Type1, arg2: Type2) { ...
> }
>
> The deprecation was just to help in hunting down the usages.
>
> If we need more Runner functions that's ok. But probably we can just depend on Scala's ability to pass arguments by specifying the argument names explicitly and getting defaults for the rest.  I think calls to Runner already do this in many cases.
>
>
>
>
>
> ________________________________
> From: Carlson, Ian <ic...@owlcyberdefense.com>
> Sent: Thursday, October 1, 2020 4:14 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Daffodil-1300
>
>
> Regarding DAFFODIL-1300, I’m hunting down usage of “new DFDLTestSuite” and replacing them with usage of the “Runner” object. In order to help identify where the DFDLTestSuite was being invoked, I un-commented:
>
>
>
> daffodil-tdml-lib/src/main/scala/org/apache/daffodil/tdml/TDMLRunner.scala line 198
>
> @deprecated("2016-12-30", "Use Runner(...) instead."
>
>
>
> This revealed that the Runner object itself invokes “new DFDLTestSuite” when the runOneTest function is called. This seems to defeat the purpose of this task. I’m hoping for some guidance or opinions on when, if ever, it’s appropriate to invoke the DFDLTestSuite constructor.
>
>
>
> Related to this – I believe I’ll need to create more constructors for the Runner object in order to preserve the argument combinations currently invoked by some of the new DFDLTestSuite calls – in particular when compileAllTopLevel must be provided as true when paired with an xml node instead of a file name.
>
>
>
> Is deprecating the DFDLTestSuite constructor still appropriate, or since this ticket was written have we changed or strategy such that it shouldn’t be called directly, but will need to be used by the Runner object?
>
>
>
> [A picture containing object, clock  Description automatically generated]          Ian Carlson | Software Engineer
>
> [Owl Cyber Defense]
>
> W  icarlson@owlcyberdefense.com<https://owlcyberdefense.com/>
>
> Connect with us!
>
> Facebook<https://www.facebook.com/owlcyberdefense/> | LinkedIn<https://www.linkedin.com/company/owlcyberdefense/> | Twitter<https://twitter.com/owlcyberdefense>
>
>
>
> [Find us at our next event. Click Here.]<https://owlcyberdefense.com/resources/events/?utm_source=owl-cyber-defense&utm_medium=email&utm_content=banner&utm_campaign=2020-events>
>
>
>
> The information contained in this transmission is for the personal and confidential use of the individual or entity to which it is addressed.
>
> If the reader is not the intended recipient, you are hereby notified that any review, dissemination, or copying of this communication is strictly prohibited.
>
> If you have received this transmission in error, please notify the sender immediately
>
>
>
>
>
>